2. Mathematical Foundations

Slides:



Advertisements
Similar presentations
Lecture Discrete Probability. 5.2 Recap Sample space: space of all possible outcomes. Event: subset of of S. p(s) : probability of element s of.
Advertisements

Random Variable A random variable X is a function that assign a real number, X(ζ), to each outcome ζ in the sample space of a random experiment. Domain.
Review of Basic Probability and Statistics
Lecture 8. Discrete Probability Distributions David R. Merrell Intermediate Empirical Methods for Public Policy and Management.
Pattern Classification, Chapter 1 1 Basic Probability.
Chapter 21 Random Variables Discrete: Bernoulli, Binomial, Geometric, Poisson Continuous: Uniform, Exponential, Gamma, Normal Expectation & Variance, Joint.
Probability Distributions: Finite Random Variables.
Joint Distribution of two or More Random Variables
CHAPTER 10: Introducing Probability
1 Statistical NLP: Lecture 5 Mathematical Foundations II: Information Theory.
Some basic concepts of Information Theory and Entropy
Stat 1510: Introducing Probability. Agenda 2  The Idea of Probability  Probability Models  Probability Rules  Finite and Discrete Probability Models.
IRDM WS Chapter 2: Basics from Probability Theory and Statistics 2.1 Probability Theory Events, Probabilities, Random Variables, Distributions,
The Binomial Distribution Permutations: How many different pairs of two items are possible from these four letters: L, M. N, P. L,M L,N L,P M,L M,N M,P.
IBS-09-SL RM 501 – Ranjit Goswami 1 Basic Probability.
1 Foundations of Statistical Natural Language Processing By Christopher Manning & Hinrich Schutze Course Book.
14/6/1435 lecture 10 Lecture 9. The probability distribution for the discrete variable Satify the following conditions P(x)>= 0 for all x.
Expected values and variances. Formula For a discrete random variable X and pmf p(X): Expected value: Variance: Alternate formula for variance:  Var(x)=E(X^2)-[E(X)]^2.
Chapter 6 Random Variables
5.3 Random Variables  Random Variable  Discrete Random Variables  Continuous Random Variables  Normal Distributions as Probability Distributions 1.
Probability The calculated likelihood that a given event will occur
STA347 - week 51 More on Distribution Function The distribution of a random variable X can be determined directly from its cumulative distribution function.
CHAPTER 10: Introducing Probability ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Mathematical Foundations Elementary Probability Theory Essential Information Theory Updated 11/11/2005.
Random Variables an important concept in probability.
Binomial Distribution
Probability Distributions
Review of Chapter
Discrete Math Section 16.3 Use the Binomial Probability theorem to find the probability of a given outcome on repeated independent trials. Flip a coin.
9/14/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing Probability AI-Lab
Mutual Information, Joint Entropy & Conditional Entropy
Statistics October 6, Random Variable – A random variable is a variable whose value is a numerical outcome of a random phenomenon. – A random variable.
Binomial Probability Theorem In a rainy season, there is 60% chance that it will rain on a particular day. What is the probability that there will exactly.
(C) 2000, The University of Michigan 1 Language and Information Handout #2 September 21, 2000.
1 COMP 791A: Statistical Language Processing Mathematical Essentials Chap. 2.
Discrete Random Variable Random Process. The Notion of A Random Variable We expect some measurement or numerical attribute of the outcome of a random.
Statistical methods in NLP Course 2 Diana Trandab ă ț
Statistical methods in NLP Course 2
Terminologies in Probability
Math 145 October 5, 2010.
Random Variables.
Math 145 June 9, 2009.
Discrete and Continuous Random Variables
Math 145 September 25, 2006.
Basic Probability aft A RAJASEKHAR YADAV.
Math 145.
Terminologies in Probability
Math 145 February 22, 2016.
Chapter 6: Random Variables
CHAPTER 10: Introducing Probability
Terminologies in Probability
Terminologies in Probability
Random Variable Two Types:
Random Variables and Probability Distributions
CPSC 503 Computational Linguistics
Math 145 September 4, 2011.
Terminologies in Probability
Math 145 February 26, 2013.
Math 145 June 11, 2014.
Math 145 September 29, 2008.
Math 145 June 8, 2010.
Math 145 October 3, 2006.
Math 145 June 26, 2007.
Terminologies in Probability
Math 145 February 12, 2008.
Terminologies in Probability
Math 145 September 24, 2014.
Math 145 October 1, 2013.
Math 145 February 24, 2015.
Math 145 July 2, 2012.
Presentation transcript:

2. Mathematical Foundations Foundations of Statistic Natural Language Processing 2. Mathematical Foundations 2001. 7. 10. 인공지능연구실 성경희

Contents – Part 1 1. Elementary Probability Theory Conditional probability Bayes’ theorem Random variable Joint and conditional distributions Standard distribution

Conditional probability (1/2) P(A) : the probability of the event A Ex1> A coin is tossed 3 times. W = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} A = {HHT, HTH, THH} : 2 heads, P(A)=3/8 B = {HHH, HHT, HTH, HTT} : first head, P(B)=1/2 : conditional probability

Conditional probability (2/2) Multiplication rule Chain rule Two events A, B are independent If

Bayes’ theorem (1/2) Generally, if and the Bi are disjoint Bayes’

Bayes’ theorem (2/2) Ex2> G : the event of the sentence having a parasitic gap T : the event of the test being positive This poor result comes about because the prior probability of a sentence containing a parasitic gap is so low.

Random variable Ex3> Random variable X for the sum of two dice. First die Second die 1 2 3 4 5 6 7 8 9 10 11 12 x p(X=x) 1/36 1/18 1/12 1/9 5/36 1/6 Expectation : Variance : S={2,…,12} probability mass function(pmf) : p(x) = p(X=x), X ~ p(x) If X:W  {0,1}, then X is called an indicator RV or a Bernoulli trial

Joint and conditional distributions The joint pmf for two discrete random variables X, Y Marginal pmfs, which total up the probability mass for the values of each variable separately. Conditional pmf for y such that

Standard distributions (1/3) Discrete distributions: The binomial distribution When one has a series of trials with only two outcomes, each trial being independent from all the others. The number r of successes out of n trials given that the probability of success in any trial is p. : Expectation : np, variance : np(1-p) where

Standard distributions (2/3) Discrete distributions: The binomial distribution

Standard distributions (3/3) Continuous distributions: The normal distribution For the Mean m and the standard deviation s : Probability density function (pdf)

Contents – Part 2 2. Essential Information Theory Entropy Joint entropy and conditional entropy Mutual information The noisy channel model Relative entropy or Kullback-Leibler divergence

Shannon’s Information Theory Maximizing the amount of information that one can transmit over an imperfect communication channel such as a noisy phone line. Theoretical maxima for data compression Entropy H Theoretical maxima for the transmission rate Channel Capacity

Entropy (1/4) The entropy H (or self-information) is the average uncertainty of a single random variable X. Entropy is a measure of uncertainty. The more we know about something, the lower the entropy will be. We can use entropy as a measure of the quality of our models. Entropy measures the amount of information in a random variable (measured in bits). where, p(x) is pmf of X

Entropy (2/4) The entropy of a weighted coin. The horizontal axis shows the probability of a weighted coin to come up heads. The vertical axis shows the entropy of tossing the corresponding coin once. back 23 page p

Entropy (3/4) Ex7> The result of rolling an 8-sided die. (uniform distribution) Entropy : The average length of the message needed to transmit an outcome of that variable. For expectation E 1 2 3 4 5 6 7 8 001 010 011 100 101 110 111 000

Entropy (4/4) Ex8> Simplified Polynesian We can design a code that on average takes bits to transmit a letter Entropy can be interpreted as a measure of the size of the ‘search space’ consisting of the possible values of a random variable. p t k a i u 1/8 1/4 p t k a i u 100 00 101 01 110 111 bits

Joint entropy and conditional entropy (1/3) The joint entropy of a pair of discrete random variable X,Y~ p(x,y) The conditional entropy The chain rule for entropy

Joint entropy and conditional entropy (2/3) Ex9> Simplified Polynesian revisited All words of consist of sequence of CV(consonant-vowel) syllables Marginal probabilities (per-syllable basis) Per-letter basis probabilities 1 u i a k t p u i a k t p double back 8 page

Joint entropy and conditional entropy (3/3) 1 u i a k t p

Mutual information (1/2) By the chain rule for entropy : mutual information Mutual information between X and Y The amount of information one random variable contains about another. (symmetric, non-negative) It is 0 only when two variables are independent. It grows not only with the degree of dependence, but also according to the entropy of the variables. It is actually better to think of it as a measure of independence.

Mutual information (2/2) Since (entropy is called self-information) Conditional MI and a chain rule =I(x,y) Pointwise MI

Noisy channel model Channel capacity : the rate at which one can transmit information through the channel (optimal) Binary symmetric channel since entropy is non-negative, go 15 page

Relative entropy or Kullback-Leibler divergence Relative entropy for two pmfs, p(x), q(x) A measure of how close two pmfs are. Non-negative, and D(p||q)=0 if p=q Conditional relative entropy and chain rule