Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probability Theory Summary

Similar presentations


Presentation on theme: "Probability Theory Summary"— Presentation transcript:

1 Probability Theory Summary
Stats 241.3 Probability Theory Summary

2 Probability

3 Axioms of Probability A probability measure P is defined on S by defining for each event E, P[E] with the following properties P[E] ≥ 0, for each E. P[S] = 1.

4 Finite uniform probability space
Many examples fall into this category Finite number of outcomes All outcomes are equally likely To handle problems in case we have to be able to count. Count n(E) and n(S).

5 Techniques for counting

6 Basic Rule of counting Suppose we carry out k operations in sequence Let n1 = the number of ways the first operation can be performed ni = the number of ways the ith operation can be performed once the first (i - 1) operations have been completed. i = 2, 3, … , k Then N = n1n2 … nk = the number of ways the k operations can be performed in sequence.

7 Basic Counting Formulae
Permutations: How many ways can you order n objects n! Permutations of size k (< n): How many ways can you choose k objects from n objects in a specific order

8 Combinations of size k ( ≤ n): A combination of size k chosen from n objects is a subset of size k where the order of selection is irrelevant. How many ways can you choose a combination of size k objects from n objects (order of selection is irrelevant)

9 Important Notes In combinations ordering is irrelevant. Different orderings result in the same combination. In permutations order is relevant. Different orderings result in the different permutations.

10 Rules of Probability

11 The additive rule P[A  B] = P[A] + P[B] – P[A  B] and
if P[A  B] = f

12 The additive rule for more than two events
and if Ai  Aj = f for all i ≠ j. then

13 The Rule for complements
for any event E

14 Conditional Probability, Independence and The Multiplicative Rule

15 The conditional probability of A given B is defined to be:

16 The multiplicative rule of probability
and if A and B are independent. This is the definition of independence

17 The multiplicative rule for more than two events

18 Independence for more than 2 events

19 The set of k events A1, A2, … , Ak are called mutually independent if:
Definition: The set of k events A1, A2, … , Ak are called mutually independent if: P[Ai1 ∩ Ai2 ∩… ∩ Aim] = P[Ai1] P[Ai2] …P[Aim] For every subset {i1, i2, … , im } of {1, 2, …, k } i.e. for k = 3 A1, A2, … , Ak are mutually independent if: P[A1 ∩ A2] = P[A1] P[A2], P[A1 ∩ A3] = P[A1] P[A3], P[A2 ∩ A3] = P[A2] P[A3], P[A1 ∩ A2 ∩ A3] = P[A1] P[A2] P[A3]

20 The set of k events A1, A2, … , Ak are called pairwise independent if:
Definition: The set of k events A1, A2, … , Ak are called pairwise independent if: P[Ai ∩ Aj] = P[Ai] P[Aj] for all i and j. i.e. for k = 3 A1, A2, … , Ak are pairwise independent if: P[A1 ∩ A2] = P[A1] P[A2], P[A1 ∩ A3] = P[A1] P[A3], P[A2 ∩ A3] = P[A2] P[A3], It is not necessarily true that P[A1 ∩ A2 ∩ A3] = P[A1] P[A2] P[A3]

21 Bayes Rule for probability

22 An generalization of Bayes Rule
Let A1, A2 , … , Ak denote a set of events such that for all i and j. Then

23 an important concept in probability
Random Variables an important concept in probability

24 A random variable , X, is a numerical quantity whose value is determined be a random experiment

25 Definition – The probability function, p(x), of a random variable, X.
For any random variable, X, and any real number, x, we define where {X = x} = the set of all outcomes (event) with X = x. For continuous random variables p(x) = 0 for all values of x.

26 Definition – The cumulative distribution function, F(x), of a random variable, X.
For any random variable, X, and any real number, x, we define where {X ≤ x} = the set of all outcomes (event) with X ≤ x.

27 Discrete Random Variables
For a discrete random variable X the probability distribution is described by the probability function p(x), which has the following properties

28 Graph: Discrete Random Variable
p(x) b a

29 Continuous random variables
For a continuous random variable X the probability distribution is described by the probability density function f(x), which has the following properties : f(x) ≥ 0

30 Graph: Continuous Random Variable probability density function, f(x)

31

32 The distribution function F(x)
This is defined for any random variable, X. F(x) = P[X ≤ x] Properties F(-∞) = 0 and F(∞) = 1. F(x) is non-decreasing (i. e. if x1 < x2 then F(x1) ≤ F(x2) ) F(b) – F(a) = P[a < X ≤ b].

33 p(x) = P[X = x] =F(x) – F(x-)
Here If p(x) = 0 for all x (i.e. X is continuous) then F(x) is continuous.

34 For Discrete Random Variables
F(x) is a non-decreasing step function with F(x) p(x)

35 For Continuous Random Variables Variables
F(x) is a non-decreasing continuous function with F(x) f(x) slope x To find the probability density function, f(x), one first finds F(x) then

36 Some Important Discrete distributions

37 The Bernoulli distribution

38 Suppose that we have a experiment that has two outcomes
Success (S) Failure (F) These terms are used in reliability testing. Suppose that p is the probability of success (S) and q = 1 – p is the probability of failure (F) This experiment is sometimes called a Bernoulli Trial Let Then

39 The probability distribution with probability function
is called the Bernoulli distribution p q = 1- p

40 The Binomial distribution

41 We observe a Bernoulli trial (S,F) n times.
Let X denote the number of successes in the n trials. Then X has a binomial distribution, i. e. where p = the probability of success (S), and q = 1 – p = the probability of failure (F)

42 The Poisson distribution
Suppose events are occurring randomly and uniformly in time. Let X be the number of events occuring in a fixed period of time. Then X will have a Poisson distribution with parameter l.

43 The Geometric distribution
Suppose a Bernoulli trial (S,F) is repeated until a success occurs. X = the trial on which the first success (S) occurs. The probability function of X is: p(x) =P[X = x] = (1 – p)x – 1p = p qx - 1

44 The Negative Binomial distribution
Suppose a Bernoulli trial (S,F) is repeated until k successes occur. Let X = the trial on which the kth success (S) occurs. The probability function of X is:

45 The Hypergeometric distribution
Suppose we have a population containing N objects. Suppose the elements of the population are partitioned into two groups. Let a = the number of elements in group A and let b = the number of elements in the other group (group B). Note N = a + b. Now suppose that n elements are selected from the population at random. Let X denote the elements from group A. The probability distribution of X is

46 Continuous Distributions

47 Continuous random variables
For a continuous random variable X the probability distribution is described by the probability density function f(x), which has the following properties : f(x) ≥ 0

48 Graph: Continuous Random Variable probability density function, f(x)

49 Continuous Distributions
The Uniform distribution from a to b

50 The Normal distribution (mean m, standard deviation s)

51 The Exponential distribution

52 The Weibull distribution
A model for the lifetime of objects that do age.

53 The Weibull distribution with parameters a and b.

54 The Weibull density, f(x)
(a = 0.9, b = 2) (a = 0.7, b = 2) (a = 0.5, b = 2)

55 The Gamma distribution
An important family of distributions

56 The Gamma distribution
Let the continuous random variable X have density function: Then X is said to have a Gamma distribution with parameters a and l.

57 Graph: The gamma distribution
(a = 2, l = 0.9) (a = 2, l = 0.6) (a = 3, l = 0.6)

58 Contained within this family are other distributions
Comments The set of gamma distributions is a family of distributions (parameterized by a and l). Contained within this family are other distributions The Exponential distribution – in this case a = 1, the gamma distribution becomes the exponential distribution with parameter l. The exponential distribution arises if we are measuring the lifetime, X, of an object that does not age. It is also used a distribution for waiting times between events occurring uniformly in time. The Chi-square distribution – in the case a = n/2 and l = ½, the gamma distribution becomes the chi- square (c2) distribution with n degrees of freedom. Later we will see that sum of squares of independent standard normal variates have a chi-square distribution, degrees of freedom = the number of independent terms in the sum of squares.

59 Expectation

60 Let X denote a discrete random variable with probability function p(x) (probability density function f(x) if X is continuous) then the expected value of X, E(X) is defined to be: and if X is continuous with probability density function f(x)

61 Expectation of functions
Let X denote a discrete random variable with probability function p(x) then the expected value of X, E[g (X)] is defined to be: and if X is continuous with probability density function f(x)

62 Moments of a Random Variable

63 the kth moment of X : The first moment of X , m = m1 = E(X) is the center of gravity of the distribution of X. The higher moments give different information regarding the distribution of X.

64 the kth central moment of X

65 Moment generating functions

66 Definition Let X denote a random variable, Then the moment generating function of X , mX(t) is defined by:

67 Properties mX(0) = 1

68 Let X be a random variable with moment generating function mX(t)
Let X be a random variable with moment generating function mX(t). Let Y = bX + a Then mY(t) = mbX + a(t) = E(e [bX + a]t) = eatE(e X[ bt ]) = eatmX (bt) Let X and Y be two independent random variables with moment generating function mX(t) and mY(t) . Then mX+Y(t) = E(e [X + Y]t) = E(e Xt e Yt) = E(e Xt) E(e Yt) = mX (t) mY (t)

69 Let X and Y be two random variables with moment generating function mX(t) and mY(t) and two distribution functions FX(x) and FY(y) respectively. Let mX (t) = mY (t) then FX(x) = FY(x). This ensures that the distribution of a random variable can be identified by its moment generating function

70 M. G. F.’s - Continuous distributions

71 M. G. F.’s - Discrete distributions

72 Note: The distribution of a random variable X can be described by:

73

74

75 Jointly distributed Random variables
Multivariate distributions

76 Discrete Random Variables

77 The joint probability function;
p(x,y) = P[X = x, Y = y]

78 Continuous Random Variables

79 Definition: Two random variable are said to have joint probability density function f(x,y) if

80 Marginal and conditional distributions

81 Marginal Distributions (Discrete case):
Let X and Y denote two random variables with joint probability function p(x,y) then the marginal density of X is the marginal density of Y is

82 Marginal Distributions (Continuous case):
Let X and Y denote two random variables with joint probability density function f(x,y) then the marginal density of X is the marginal density of Y is

83 Conditional Distributions (Discrete Case):
Let X and Y denote two random variables with joint probability function p(x,y) and marginal probability functions pX(x), pY(y) then the conditional density of Y given X = x conditional density of X given Y = y

84 Conditional Distributions (Continuous Case):
Let X and Y denote two random variables with joint probability density function f(x,y) and marginal densities fX(x), fY(y) then the conditional density of Y given X = x conditional density of X given Y = y

85 The bivariate Normal distribution

86 Let where This distribution is called the bivariate Normal distribution. The parameters are m1, m2 , s1, s2 and r.

87 Surface Plots of the bivariate Normal distribution

88 Marginal distributions
The marginal distribution of x1 is Normal with mean m1 and standard deviation s1. The marginal distribution of x2 is Normal with mean m2 and standard deviation s2.

89 Conditional distributions
The conditional distribution of x1 given x2 is Normal with: mean and standard deviation The conditional distribution of x2 given x1 is Normal with: mean and standard deviation

90 Independence

91 Definition: Two random variables X and Y are defined to be independent if if X and Y are discrete if X and Y are continuous

92 multivariate distributions
k ≥ 2

93 Definition Let X1, X2, …, Xn denote n discrete random variables, then p(x1, x2, …, xn ) is joint probability function of X1, X2, …, Xn if

94 Definition Let X1, X2, …, Xk denote k continuous random variables, then f(x1, x2, …, xk ) is joint density function of X1, X2, …, Xk if

95 The Multinomial distribution
Suppose that we observe an experiment that has k possible outcomes {O1, O2, …, Ok } independently n times. Let p1, p2, …, pk denote probabilities of O1, O2, …, Ok respectively. Let Xi denote the number of times that outcome Oi occurs in the n repetitions of the experiment.

96 is called the Multinomial distribution
The joint probability function of: is called the Multinomial distribution

97 The Multivariate Normal distribution
Recall the univariate normal distribution the bivariate normal distribution

98 The k-variate Normal distribution
where

99 Marginal distributions

100 Definition Let X1, X2, …, Xq, Xq+1 …, Xk denote k discrete random variables with joint probability function p(x1, x2, …, xq, xq+1 …, xk ) then the marginal joint probability function of X1, X2, …, Xq is

101 Definition Let X1, X2, …, Xq, Xq+1 …, Xk denote k continuous random variables with joint probability density function f(x1, x2, …, xq, xq+1 …, xk ) then the marginal joint probability function of X1, X2, …, Xq is

102 Conditional distributions

103 Definition Let X1, X2, …, Xq, Xq+1 …, Xk denote k discrete random variables with joint probability function p(x1, x2, …, xq, xq+1 …, xk ) then the conditional joint probability function of X1, X2, …, Xq given Xq+1 = xq+1 , …, Xk = xk is

104 Definition Definition Let X1, X2, …, Xq, Xq+1 …, Xk denote k continuous random variables with joint probability density function f(x1, x2, …, xq, xq+1 …, xk ) then the conditional joint probability function of X1, X2, …, Xq given Xq+1 = xq+1 , …, Xk = xk is

105 Definition – Independence of sets of vectors
Let X1, X2, …, Xq, Xq+1 …, Xk denote k continuous random variables with joint probability density function f(x1, x2, …, xq, xq+1 …, xk ) then the variables X1, X2, …, Xq are independent of Xq+1, …, Xk if A similar definition for discrete random variables.

106 Definition – Mutual Independence
Let X1, X2, …, Xk denote k continuous random variables with joint probability density function f(x1, x2, …, xk ) then the variables X1, X2, …, Xk are called mutually independent if A similar definition for discrete random variables.

107 for multivariate distributions
Expectation for multivariate distributions

108 Definition Let X1, X2, …, Xn denote n jointly distributed random variable with joint density function f(x1, x2, …, xn ) then

109 Some Rules for Expectation

110 The Linearity property
Thus you can calculate E[Xi] either from the joint distribution of X1, … , Xn or the marginal distribution of Xi. The Linearity property

111 (The Multiplicative property) Suppose X1, … , Xq are independent of Xq+1, … , Xk then
In the simple case when k = 2 if X and Y are independent

112 Some Rules for Variance

113 Tchebychev’s inequality
Ex:

114 Note: If X and Y are independent, then

115 The correlation coefficient rXY
2. if there exists a and b such that where rXY = +1 if b > 0 and rXY = -1 if b< 0

116 Some other properties of variance

117 Variance: Multiplicative Rule for independent random variables
Suppose that X and Y are independent random variables, then:

118 Mean and Variance of averages
Let X1, … , Xn be n mutually independent random variables each having mean m and standard deviation s (variance s2). Let Then and

119 The Law of Large Numbers
Let X1, … , Xn be n mutually independent random variables each having mean m. Let Then for any d > 0 (no matter how small)

120 Conditional Expectation:

121 Definition Let X1, X2, …, Xq, Xq+1 …, Xk denote k continuous random variables with joint probability density function f(x1, x2, …, xq, xq+1 …, xk ) then the conditional joint probability function of X1, X2, …, Xq given Xq+1 = xq+1 , …, Xk = xk is

122 Definition Let U = h( X1, X2, …, Xq, Xq+1 …, Xk )
then the Conditional Expectation of U given Xq+1 = xq+1 , …, Xk = xk is Note this will be a function of xq+1 , …, xk.

123 A very useful rule Let (x1, x2, … , xq, y1, y2, … , ym) = (x, y) denote q + m random variables. Then

124 Functions of Random Variables

125 Methods for determining the distribution of functions of Random Variables
Distribution function method Moment generating function method Transformation method

126 Distribution function method
Let X, Y, Z …. have joint density f(x,y,z, …) Let W = h( X, Y, Z, …) First step Find the distribution function of W G(w) = P[W ≤ w] = P[h( X, Y, Z, …) ≤ w] Second step Find the density function of W g(w) = G'(w).

127 Use of moment generating functions
Using the moment generating functions of X, Y, Z, …determine the moment generating function of W = h(X, Y, Z, …). Identify the distribution of W from its moment generating function This procedure works well for sums, linear combinations, averages etc.

128 Let x1, x2, … denote a sequence of independent random variables
Sums Let S = x1 + x2 + … + xn then Linear Combinations Let L = a1x1 + a2x2 + … + anxn then

129 Arithmetic Means Let x1, x2, … denote a sequence of independent random variables coming from a distribution with moment generating function m(t)

130 The Transformation Method
Theorem Let X denote a random variable with probability density function f(x) and U = h(X). Assume that h(x) is either strictly increasing (or decreasing) then the probability density of U is:

131 The Transfomation Method (many variables)
Theorem Let x1, x2,…, xn denote random variables with joint probability density function f(x1, x2,…, xn ) Let u1 = h1(x1, x2,…, xn). u2 = h2(x1, x2,…, xn). un = hn(x1, x2,…, xn). define an invertible transformation from the x’s to the u’s

132 Then the joint probability density function of u1, u2,…, un is given by:
where Jacobian of the transformation

133 Some important results
Distribution of functions of random variables

134 The method used to derive these results will be indicated by:
DF - Distribution Function Method. MGF - Moment generating function method TF - Transformation method

135 Student’s t distribution
Let Z and U be two independent random variables with: Z having a Standard Normal distribution and U having a c2 distribution with n degrees of freedom then the distribution of: is: DF

136 The Chi-square distribution
Let Z1, Z2, … , Zv be v independent random variables having a Standard Normal distribution, then has a c2 distribution with n degrees of freedom. for n = 1 DF for n > 1 MGF

137 Distribution of the sample mean
Let x1, x2, …, xn denote a sample from the normal distribution with mean m and variance s2. then has a Normal distribution with: MGF

138 The Central Limit theorem
If x1, x2, …, xn is a sample from a distribution with mean m, and standard deviations s, then if n is large has a normal distribution with mean and variance MGF

139 Distribution of sums of Gamma R. V.’s
Let X1, X2, … , Xn denote n independent random variables each having a gamma distribution with parameters (l,ai), i = 1, 2, …, n. Then W = X1 + X2 + … + Xn has a gamma distribution with parameters (l, a1 + a2 +… + an). MGF Distribution of a multiple of a Gamma R. V. Suppose that X is a random variable having a gamma distribution with parameters (l,a). Then W = aX has a gamma distribution with parameters (l/a, a). MGF

140 Distribution of sums of Binomial R. V.’s
Let X1, X2, … , Xk denote k independent random variables each having a binomial distribution with parameters (p,ni), i = 1, 2, …, k. Then W = X1 + X2 + … + Xk has a binomial distribution with parameters (p, n1 + n2 +… + nk). MGF Distribution of sums of Negative Binomial R. V.’s Let X1, X2, … , Xn denote n independent random variables each having a negative binomial distribution with parameters (p,ki), i = 1, 2, …, n. Then W = X1 + X2 + … + Xn has a negative binomial distribution with parameters (p, k1 + k2 +… + kn). MGF

141 Courses that can be taken after Stats 241
Beyond Stats 241 Courses that can be taken after Stats 241

142 Statistics

143 What is Statistics? It is the major mathematical tool of scientific inference – methods for drawing conclusion from data. Data that is to some extent corrupted by some component of random variation (random noise)

144 In both Statistics and Probability theory we are concerned with studying random phenomena

145 In probability theory The model is known and we are interested in predicting the outcomes and observations of the phenomena. outcomes and observations model

146 In statistics The model is unknown
the outcomes and observations of the phenomena have been observed. We are interested in determining the model from the observations outcomes and observations model

147 Example - Probability A coin is tossed n = 100 times
We are interested in the observation, X, the number of times the coin is a head. Assuming the coin is balanced (i.e. p = the probability of a head = ½.)

148 Example - Statistics We are interested in the success rate, p, of a new surgical procedure. The procedure is performed n = 100 times. X, the number of successful times the procedure is performed is 82. The success rate p is unknown.

149 If the success rate p was known. Then
This equation allows us to predict the value of the observation, X.

150 In the case when the success rate p was unknown.
Then the following equation is still true the success rate We will want to use the value of the observation, X = 82 to make a decision regarding the value of p.

151 Introductory Statistics Courses Non calculus Based Stats 244
Introductory Statistics Courses Non calculus Based Stats Stats Calculus Based Stats 242.3

152 Stats Statistical concepts and techniques including  graphing of distributions,  measures of location and variability,  measures of association,  regression,  probability,  confidence intervals,  hypothesis testing.  Students should consult with their department before  enrolling in this course to determine the status of this  course in their program.  Prerequisite(s): A course in a social science or Mathematics A30. 

153 Stats An introduction to basic statistical methods  including frequency distributions, elementary probability, confidence intervals and tests of  significance, analysis of variance, regression  and correlation, contingency tables, goodness of fit.  Prerequisite(s): MATH 100, 101, 102, 110 or STAT 103. 

154 Stats Sampling theory, estimation, confidence intervals,  testing hypotheses, goodness of fit,  analysis of variance,  regression and correlation.  Prerequisite(s):MATH 110, 116 and STAT 241. 

155 Stats 244 and 245 do not require a calculus prerequisite are Recipe courses Stats 242 does require calculus and probability (Stats 241) as a prerequisite More theoretical class – You learn techniques for developing statistical procedures and thoroughly investigating the properties of these procedures

156 Statistics Courses beyond Stats 242.3

157 STAT 341.3 Probability and Stochastic Processes 1/2(3L-1P) Prerequisite(s): STAT Random variables and their distributions; independence; moments and moment generating functions; conditional probability; Markov chains; stationary time-series.

158 STAT 342.3 Mathematical Statistics 1(3L-1P) Prerequisite(s): MATH 225 or 276; STAT 241 and Probability spaces; conditional probability and independence; discrete and continuous random variables; standard probability models; expectations; moment generating functions; sums and functions of random variables; sampling distributions; asymptotic distributions. Deals with basic probability concepts at a moderately rigorous level. Note: Students with credit for STAT 340 may not take this course for credit.

159 STAT 344.3 Applied Regression Analysis 1/2(3L-1P) Prerequisite(s): STAT 242 or 245 or 246 or a comparable course in statistics. Applied regression analysis involving the extensive use of computer software. Includes: linear regression; multiple regression; stepwise methods; residual analysis; robustness considerations; multicollinearity; biased procedures; non-linear regression. Note: Students with credit for ECON 404 may not take this course for credit. Students with credit for STAT 344 will receive only half credit for ECON 404.

160 STAT 345.3 Design and Analysis of Experiments 1/2(3L-1P) Prerequisite(s): STAT 242 or 245 or 246 or a comparable course in statistics. An introduction to the principles of experimental design and analysis of variance. Includes: randomization, blocking, factorial experiments, confounding, random effects, analysis of covariance. Emphasis will be on fundamental principles and data analysis techniques rather than on mathematical theory.

161 STAT 346.3 Multivariate Analysis 1/2(3L-1P) Prerequisite(s): MATH 266, STAT 241, and 344 or The multivariate normal distribution, multivariate analysis of variance, discriminant analysis, classification procedures, multiple covariance analysis, factor analysis, computer applications.

162 STAT 347.3 Non Parametric Methods 1/2(3L-1P) Prerequisite(s): STAT 242 or 245 or 246 or a comparable course in statistics. An introduction to the ideas and techniques of non-parametric analysis. Includes: one, two and K samples problems, goodness of fit tests, randomness tests, and correlation and regression.

163 STAT 348.3 Sampling Techniques 1/2(3L-1P) Prerequisite(s): STAT 242 or 245 or 246 or a comparable course in statistics. Theory and applications of sampling from finite populations. Includes: simple random sampling, stratified random sampling, cluster sampling, systematic sampling, probability proportionate to size sampling, and the difference, ratio and regression methods of estimation.

164 STAT 349.3 Time Series Analysis 1/2(3L-1P) Prerequisite(s): STAT 241, and 344 or An introduction to statistical time series analysis. Includes: trend analysis, seasonal variation, stationary and non-stationary time series models, serial correlation, forecasting and regression analysis of time series data.

165 STAT 442.3 Statistical Inference 2(3L-1P) Prerequisite(s): STAT Parametric estimation, maximum likelihood estimators, unbiased estimators, UMVUE, confidence intervals and regions, tests of hypotheses, Neyman Pearson Lemma, generalized likelihood ratio tests, chi-square tests, Bayes estimators.

166 STAT 443.3 Linear Statistical Models 2(3L-1P) Prerequisite(s): MATH 266, STAT 342, and 344 or A rigorous examination of the general linear model using vector space theory. Includes: generalized inverses; orthogonal projections; quadratic forms; Gauss-Markov theorem and its generalizations; BLUE estimators; Non-full rank models; estimability considerations.


Download ppt "Probability Theory Summary"

Similar presentations


Ads by Google