Probability and Information Theory

Probability and Information Theory

Random Variables A random variable is a variable that can take on diﬀerent values randomly. a description of the states that are possible Denoted as a lower case letter discrete or continuous Ex) P(x=‘yes’)

Probability Distributions
A probability distribution is a description of how likely a random variable or set of random variables is to take on each of its possible states.

Discrete Variables and Probability Mass Functions
Probability mass function (PMF) A probability distribution over discrete variables may be described using a probability mass function (PMF) maps from a state of a random variable to the probability of that random variable taking on that state P(x=x) : random variable x 가 x 상태(값)을 가질 확률로 매핑

Joint probability distribution P(x=x, y=y) denotes the probability that x=x and y=y simultaneously. P(x, y)

Continuous Variables and Probability Density Functions
Probability Density Function (PDF)

Marginal Probability 3.4 Marginal Probability 3.4 Marginal Probability
 x p ( [ p is given by the ( a, b where “parametrized by”; we consider CHAPTER 3. PROBABILITY AND INFORMATION THEORY When working with continuous random variables, we describe probability distri- volume δx is given by p(x)δx. integral of tion on an interval of the real numbers. We can do this with a function lies in the interval [a, b] is given by probability density over a continuous random variable, consider a uniform distribu- b 3.4 Marginal Probability to know the probability distribution over just a subset of them. by writing x ∼ U(a, b). integrates to 1. We often denote that u(x; a, b) = mass outside the interval, we say Sometimes we know the probability distribution over a set of variables and we want 3.3.2 Continuous Variables and Probability Density Functions state directly, instead the probability of landing inside an infinitesimal region with butions using a set of points. Specifically, the probability that mass function. To be a probability density function, a function following properties: are parameters that define the function. To ensure that there is no probability A probability density function • The domain of p must be the set of all possible states of x. For an example of a probability density function corresponding to a specific • ∀x ∈ x, p(x) ≥ 0. Note that we do not require p(x) ≤ 1. • We can integrate the density function to find the actual probability mass of a a p(x)dx = 1. and p b−a ( b 1 are the endpoints of the interval, with ) over that set. In the univariate example, the probability that . We can see that this is nonnegative everywhere. Additionally, it probability density function (PDF) x to be the argument of the function, while u  ( [a,b] x x ) does not give the probability of a specific x ; a, b p(x)dx. follows the uniform distribution on [ ) = 0 for all x lies in some set b > a x ∈ rather than a probability . The “;” notation means a, b S ]. Within [ must satisfy the x ] u ; x ], a, b a, b and a ),  x p ( [ p is given by the ( a, b CHAPTER 3. PROBABILITY AND INFORMATION THEORY volume δx is given by p(x)δx. “parametrized by”; we consider When working with continuous random variables, we describe probability distri- where probability density over a continuous random variable, consider a uniform distribu- set of points. Specifically, the probability that state directly, instead the probability of landing inside an infinitesimal region with integral of b tion on an interval of the real numbers. We can do this with a function integrates to 1. We often denote that 3.3.2 Continuous Variables and Probability Density Functions Sometimes we know the probability distribution over a set of variables and we want to know the probability distribution over just a subset of them. lies in the interval [a, b] is given by butions using a 3.4 Marginal Probability mass outside the interval, we say u(x; a, b) = following properties: by writing x ∼ U(a, b). mass function. To be a probability density function, a function are parameters that define the function. To ensure that there is no probability For an example of a probability density function corresponding to a specific • ∀x ∈ x, p(x) ≥ 0. Note that we do not require p(x) ≤ 1. A probability density function We can integrate the density function to find the actual probability mass of a • The domain of p must be the set of all possible states of x. • a p(x)dx = 1. and p b−a ( b 1 are the endpoints of the interval, with ) over that set. In the univariate example, the probability that . We can see that this is nonnegative everywhere. Additionally, it probability density function (PDF) x to be the argument of the function, while u  ( x [a,b] x ) does not give the probability of a specific x ; a, b p(x)dx. follows the uniform distribution on [ ) = 0 for all x lies in some set b > a x ∈ rather than a probability . The “;” notation means a, b S ]. Within [ must satisfy the a, b ] and a x x u ), ], ; a, b Marginal Probability The probability distribution over just a subset of them.

Conditional Probability
The probability of some event, given that some other event has happened

The Chain Rule of Conditional Probabilities
Any joint probability distribution over many random variables may be decomposed into conditional distributions over only one variable Chain rule

Independence and Conditional Independence
Two random variables x and y are independent Conditionally independent

Expectation The expectation or expected value of some function f(x) with respect to a probability distribution P(x) the average or mean value that f takes on when x is drawn from P

Expectation Expectations are linear

Variance a measure of how much the values of a function of a random variable x vary as we sample diﬀerent values of x from its probability distribution

Covariance Gives some sense of how much two values are linearly related to each other

Bernoulli Distribution
a distribution over a single binary random variable

Multinoulli Distribution
The multinoulli or categorical distribution is a distribution over a single discrete variable with k different states, where k is finite. parametrized by a vector p ∈[0,1]k−1, where pi gives the probability of the i-th state. The final k-th state’s probability is given by 1− 1Tp.

Gaussian Distribution
The most commonly used distribution over real numbers

Multivariate normal distribution

Exponential distribution
In the context of deep learning, we often want to have a probability distribution with a sharp point at x= 0

Laplace distribution

Mixtures of Distributions
A mixture distribution is made up of several component distributions.

Bayes’ Rule

Probability and Information Theory

Similar presentations

Presentation on theme: "Probability and Information Theory"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Probability and Information Theory

Similar presentations

Presentation on theme: "Probability and Information Theory"— Presentation transcript:

Similar presentations

About project

Feedback