Expectation for multivariate distributions
Definition Let X 1, X 2, …, X n denote n jointly distributed random variable with joint density function f(x 1, x 2, …, x n ) then
Example Let X, Y, Z denote 3 jointly distributed random variable with joint density function then Determine E[XYZ].
Solution:
Some Rules for Expectation
Thus you can calculate E[X i ] either from the joint distribution of X 1, …, X n or the marginal distribution of X i. Proof:
The Linearity property Proof:
In the simple case when k = 2 3.(The Multiplicative property) Suppose X 1, …, X q are independent of X q+1, …, X k then if X and Y are independent
Proof:
Some Rules for Variance
Proof Thus
Note: If X and Y are independent, then
Definition: For any two random variables X and Y then define the correlation coefficient XY to be: if X and Y are independent
Properties of the correlation coefficient XY The converse is not necessarily true. i.e. XY = 0 does not imply that X and Y are independent.
More properties of the correlation coefficient XY if there exists a and b such that where XY = +1 if b > 0 and XY = -1 if b< 0 Proof: Let Letfor all b. Consider choosing b to minimize
Since g(b) ≥ 0, then g(b min ) ≥ 0 or Consider choosing b to minimize
Hence g(b min ) ≥ 0 Hence
or Note If and only if This will be true if i.e.
Summary if there exists a and b such that where
Proof Thus
Some Applications (Rules of Expectation & Variance) Let Let X 1, …, X n be n mutually independent random variables each having mean and standard deviation (variance 2 ). Then
Also or Thus Hence the distribution of is centered at and becomes more and more compact about as n increases
Tchebychev’s Inequality
Let X denote a random variable with mean =E(X) and variance Var(X) = E[(X – ) 2 ] = 2 then Note: Is called the standard deviation of X,
Proof:
Tchebychev’s inequality is very conservative k =1 k = 2 k = 3
The Law of Large Numbers
Let Let X 1, …, X n be n mutually independent random variables each having mean Then for any > 0 (no matter how small)
Proof Now We will use Tchebychev’s inequality which states for any random variable X.
Thus
Thus the Law of Large Numbers states A Special case Let X 1, …, X n be n mutually independent random variables each having Bernoulli distribution with parameter p
Thus the Law of Large Numbers states that Some people misinterpret this to mean that if the proportion of successes is currently lower that p then the proportion of successes in the future will have to be larger than p to counter this and ensure that the Law of Large numbers holds true. Of course if in the infinite future the proportion of successes is p than this is enough to ensure that the Law of Large numbers holds true. converges to the probability of success p
Some more applications Rules of expectation and Rules of Variance
The mean and variance of a Binomial Random variable We have already computed this by other methods: 1.Using the probability function p(x). 2.Using the moment generating function m X (t). Suppose that we have observed n independent repetitions of a Bernoulli trial Let X 1, …, X n be n mutually independent random variables each having Bernoulli distribution with parameter p and defined by
Now X = X 1 + … + X n has a Binomial distribution with parameters n and p X is the total number of successes in the n repetitions.
The mean and variance of a Hypergeometric distribution The hypergeometric distribution arises when we sample with replacement n objects from a population of N = a + b objects. The population is divided into to groups (group A and group B). Group A contains a objects while group B contains b objects Let X denote the number of objects in the sample of n that come from group A. The probability function of X is:
Then Let X 1, …, X n be n random variables defined by Proof
and Therefore
Thus
and Also We need to also calculate Note:
and Thus Note:
and Thus
with Thus and
Thus
Thus if X has a hypergeometric distribution with parameters a, b and n then
The mean and variance of a Negative Binomial distribution The Negative Binomial distribution arises when we repeat a Bernoulli trial until k successes (S) occur. Then X = the trial on which the k th success occurred. The probability function of X is: Let X 1 = the number of trial on which the 1 st success occurred. and X i = the number of trials after the (i -1) st success on which the i th success occurred (i ≥ 2)
X i each have a geometric distribution with parameter p. Then X = X 1 + … + X k and X 1, …, X k are mutually independent
Thus if X has a negative binomial distribution with parameters k and p then
Multivariate Moments Non-central and Central
Definition Let X 1 and X 2 be a jointly distirbuted random variables (discrete or continuous), then for any pair of positive integers (k 1, k 2 ) the joint moment of (X 1, X 2 ) of order (k 1, k 2 ) is defined to be:
Definition Let X 1 and X 2 be a jointly distirbuted random variables (discrete or continuous), then for any pair of positive integers (k 1, k 2 ) the joint central moment of (X 1, X 2 ) of order (k 1, k 2 ) is defined to be: where 1 = E [X 1 ] and 2 = E [X 2 ]
Note = the covariance of X 1 and X 2. Definition: For any two random variables X and Y then define the correlation coefficient XY to be:
Properties of the correlation coefficient XY The converse is not necessarily true. i.e. XY = 0 does not imply that X and Y are independent.
More properties of the correlation coefficient if there exists a and b such that where XY = +1 if b > 0 and XY = -1 if b< 0
Some Rules for Expectation
Thus you can calculate E[X i ] either from the joint distribution of X 1, …, X n or the marginal distribution of X i. The Linearity property
In the simple case when k = 2 3.(The Multiplicative property) Suppose X 1, …, X q are independent of X q+1, …, X k then if X and Y are independent
Some Rules for Variance
Note: If X and Y are independent, then
Definition: For any two random variables X and Y then define the correlation coefficient XY to be: if X and Y are independent
Proof Thus
Distribution functions, Moments, Moment generating functions in the Multivariate case
The distribution function F(x) This is defined for any random variable, X. F(x) = P[X ≤ x] Properties 1. F(-∞) = 0 and F(∞) = F(x) is non-decreasing (i. e. if x 1 < x 2 then F(x 1 ) ≤ F(x 2 ) ) 3. F(b) – F(a) = P[a < X ≤ b].
4.Discrete Random Variables F(x) is a non-decreasing step function with F(x)F(x) p(x)p(x)
5. Continuous Random Variables Variables F(x) is a non-decreasing continuous function with F(x)F(x) f(x) slope x To find the probability density function, f(x), one first finds F(x) then
The joint distribution function F(x 1, x 2, …, x k ) is defined for k random variables, X 1, X 2, …, X k. F(x 1, x 2, …, x k ) = P[ X 1 ≤ x 1, X 2 ≤ x 2, …, X k ≤ x k ] for k = 2 F(x 1, x 2 ) = P[ X 1 ≤ x 1, X 2 ≤ x 2 ] (x 1, x 2 ) x1x1 x2x2
Properties 1. F(x 1, -∞) = F(-∞, x 2 ) = F(-∞, -∞) = 0 2. F(x 1, ∞) = P[ X 1 ≤ x 1, X 2 ≤ ∞] = P[ X 1 ≤ x 1 ] = F 1 (x 1 ) = the marginal cumulative distribution function of X 1 F(∞, ∞) = P[ X 1 ≤ ∞, X 2 ≤ ∞] = 1 = the marginal cumulative distribution function of X 2 F(∞, x 2 ) = P[ X 1 ≤ ∞, X 2 ≤ x 2 ] = P[ X 2 ≤ x 2 ] = F 2 (x 2 )
3. F(x 1, x 2 ) is non-decreasing in both the x 1 direction and the x 2 direction. i.e. if a 1 < b 1 if a 2 < b 2 then i. F(a 1, x 2 ) ≤ F(b 1, x 2 ) ii. F(x 1, a 2 ) ≤ F(x 1, b 2 ) iii. F( a 1, a 2 ) ≤ F(b 1, b 2 ) (b 1, b 2 ) x1x1 (b 1, a 2 ) (a 1, a 2 ) (a 1, b 2 ) x2x2
4. P[a < X 1 ≤ b, c < X 2 ≤ d] = F(b,d) – F(a,d) – F(b,c) + F(a,c). (b, d) x1x1 (b, c) (a, c) (a, d) x2x2
4.Discrete Random Variables F(x 1, x 2 ) is a step surface (x 1, x 2 ) x1x1 x2x2
5.Continuous Random Variables F(x 1, x 2 ) is a surface (x 1, x 2 ) x1x1 x2x2
Multivariate Moments Non-central and Central
Definition Let X 1 and X 2 be a jointly distirbuted random variables (discrete or continuous), then for any pair of positive integers (k 1, k 2 ) the joint moment of (X 1, X 2 ) of order (k 1, k 2 ) is defined to be:
Definition Let X 1 and X 2 be a jointly distirbuted random variables (discrete or continuous), then for any pair of positive integers (k 1, k 2 ) the joint central moment of (X 1, X 2 ) of order (k 1, k 2 ) is defined to be: where 1 = E [X 1 ] and 2 = E [X 2 ]
Note = the covariance of X 1 and X 2.
Multivariate Moment Generating functions
Recall The moment generating function
Definition Let X 1, X 2, … X k be a jointly distributed random variables (discrete or continuous), then the joint moment generating function is defined to be:
Definition Let X 1, X 2, … X k be a jointly distributed random variables (discrete or continuous), then the joint moment generating function is defined to be:
Power Series expansion the joint moment generating function (k = 2)