STATISTICS AND PROBABILITY

STATISTICS AND PROBABILITY
Raoul LePage Professor STATISTICS AND PROBABILITY click on STT315_F06 Week and some preparation for exam 2.

solutions given in text 3-33, 3-41, 3-42 (except b, c, h, m, n),
suggested exercises solutions given in text 3-33, 3-41, 3-42 (except b, c, h, m, n), 3-43, 3-49, 3-57 (except c, d), 3-59, 3-61, 3-63, 3-65. textbook exercises are not comprehensive Week and some preparation for exam 2.

HAVING BROAD APPLICATION
PROBABILITY MODELS HAVING BROAD APPLICATION NORMAL DISTRIBUTION BERNOULLI TRIALS BINOMIAL DISTRIBUTION POISSON DISTRIBUTION

NORMAL DISTRIBUTION: WHERE ARE THE MEAN AND STANDARD DEVIATION IN
THIS PICTURE? note the point of inflexion note the balance point

IQ DISTRIBUTION: ~NORMAL, MEAN 100 STANDARD DEVIATION 15 point of
inflexion SD=15 MEAN = 100

DISTRIBUTION OF THE NUMBER OF HEADS IN 100 COIN TOSSES:
APPROXIMATELY NORMAL, MEAN 50, STD DEVIATION 5 5 50

DISTRIBUTION OF THE NUMBER OF ACCIDENTS IN ONE MONTH
IF WE AVERAGE 39.7 PER MONTH: APPROXIMATELY NORMAL, MEAN 39.7, STD DEVIATION 6.3 6.3 39.7

~68% NORMAL DISTRIBUTIONS ARE ALIKE IN SD UNITS FROM THE MEAN
~ 68% WITHIN 1 SD OF MEAN ~ 95% WITHIN 2 SD OF MEAN Illustrated for the Standard Normal Mean=0, SD=1 ~68%

~95% NORMAL DISTRIBUTIONS ARE ALIKE IN SD UNITS FROM THE MEAN
~ 68% WITHIN 1 SD OF MEAN ~ 95% WITHIN 2 SD OF MEAN Illustrated for the Standard normal Mean=0, SD=1 ~95%

IQ DISTRIBUTION: ~NORMAL, MEAN 100 STANDARD DEVIATION 15 15 ~68/2 =34%
~95/2=47.5% 85 130 100

STANDARD SCORES CONVERT TO 0 MEAN; SD 1 IQ Z 1 15 Standard Normal 100

STANDARD SCORES CONVERT TO 0 MEAN; SD 1

Z - TABLE CUT AND PASTE P(Z > 0) = P(Z < 0 ) = 0.5
= = P(Z < 1.92) = P(0 < Z < 1.92) = =

BERNOULLI DISTRIBUTION x p(x) p (1 denotes “success”)
q (0 denotes “failure”) __ 1 0 < p < 1 q = 1 - p

Notation: BERNOULLI RANDOM VARIABLE X P(success) = P(X = 1) = p
P(failure) = P(X = 0) = q e.g. X = “sample voter is Democrat” Population has 48% Dem. p = 0.48, q = 0.52 P(X = 1) = 0.48

INDEPENDENT BERNOULLI-p "S" denotes success "F" denotes failure
P(S1 S2 F3 F4 F5 F6 S7) = p3 q4 just write P(SSFFFFS) = p3 q4 “the answer only depends upon how many of each, not their order.” e.g. 48% Dem, 5 sampled, with-repl: P(Dem Rep Dem Dem Rep) =

BINOMIAL DISTRIBUTION FOR THE TOTAL NUMBER OF SUCCESSES IN INDEPENDENT
p-BERNOULLI TRIALS. e.g. P(exactly 2 Dems out of sample of 4) = P(DDRR) + P(DRDR) + P(DDRR) + P(RDDR) + P(RDRD) + P(RRDD) = ~ There are 6 ways to arrange 2D 2R.

BINOMIAL DISTRIBUTION FOR THE TOTAL NUMBER OF SUCCESSES IN INDEPENDENT
p-BERNOULLI TRIALS. e.g. P(exactly 3 Dems out of sample of 5) = P(DDDRR) + P(DDRDR) + P(DDRRD) + P(DRDDR) + P(DRDRD) + P(DRRDD) + P(RDDDR) +P(RDDRD) + P(RDRDD) + P(RRDDD) = ~ There are 10 ways to arrange 3D 2R. Same as the number of ways to select 3 from 5.

COUNTING ARRANGEMENTS 5! ways to arrange 5 things in a line
Do it thus (1:1 with arrangements): select 3 of the 5 to go first in line, arrange those 3 at the head of line then arrange the remaining 2 after. 5! = (ways to select 3 from 5) 3! 2! So num ways must be 5! /( 3! 2!) = 10.

BINOMIAL FORMULA Let random variable X denote the number of
“S” in n independent Bernoulli p-Trials. By definition, X has a Binomial Distribution and for each of x = 0, 1, 2, …, n P(X = x) = (n!/(x! (n-x)!) ) px qn-x e.g. P(44 Dems in sample of 100 voters) = (100!/(44! 56!)) =

Caveats: Binomial Binomial Coefficient
n!/(x! (n-x)!) is the count of how many arrangements there are of a string of x letters “S” and n-x letters “F.” . px qn-x is the shared probability of each string of x letters “S” and n-x letters “F.” (define 0! = 1, p0 = q0 = 1 and the formula goes through for every one of x = 0 through n) is short for the arrangement count = Binomial Coefficient

Normal Approx of Binomial Poisson and its normal Approx
Aspects of random sampling Week

Normal Approx of Binomial
n = 10, p = 0.4 mean = n p = 4 sd = root(n p q) ~ 1.55 Week

n = 30, p = 0.4 mean = n p = 12 sd = root(n p q) ~ 2.683 Week

n = 100, p = 0.4 mean = n p = 40 sd = root(n p q) ~ Week

p(x) = e-mean meanx / x! for x = 0, 1, 2, ..ad infinitum Poisson
Distribution Governing Counts of Rare Events p(x) = e-mean meanx / x! for x = 0, 1, 2, ..ad infinitum Week

e..g. X = number of times ace of spades turns up in 104 tries
Poisson e..g. X = number of times ace of spades turns up in 104 tries X~ Poisson with mean 2 p(x) = e-mean meanx / x! e.g. p(3) = e-2 23 / 3! ~ 0.18 Week

Poisson e.g. X = number of raisins in MY cookie. Batter has 400 raisins and makes 144 cookies. E X = 400/144 ~ 2.78 per cookie p(x) = e-mean meanx / x! e.g. p(2) = e / 2! ~ 0.24 (around 24% of cookies have 2 raisins) Week

note: Poisson sd = root(mean)
THE FIRST BEST THING ABOUT THE POISSON IS THAT THE MEAN ALONE TELLS US THE ENTIRE DISTRIBUTION! note: Poisson sd = root(mean) Week

E X = 400/144 ~ 2.78 raisins per cookie sd = root(mean) = 1.67
(for Poisson) Week

Poisson THE SECOND BEST THING ABOUT THE POISSON IS THAT FOR A MEAN AS SMALL AS 3 THE NORMAL APPROXIMATION WORKS WELL. 1.67 = sd = root(mean) Special to Poisson Week mean 2.78

WE AVERAGE 127.8 ACCIDENTS PER MO. E X = accidents If Poisson then sd = root(127.8) = and the approx dist is: sd = root(mean) = 11.3 Special to Poisson ~ Week mean accidents

Aspects of Random Sampling Week

THE GREAT TRICK OF STATISTICS
The overwhelming majority of samples of n from a population of N can stand-in for the population. ATT Sysco Pepsico GM Dow population of N = 5 sample of n = 2

THE GREAT TRICK OF STATISTICS
The overwhelming majority of samples of n from a population of N can stand-in for the population. ATT Sysco Pepsico GM Dow ATT Pepsico population of N = 5 sample of n = 2

GREAT TRICK : SOME CAVEATS
Sample size n must be “large.” For only a few characteristics at a time, such as profit, sales, dividend. SPECTACULAR FAILURES MAY OCCUR! ATT 12 Sysco 21 Pepsi 42 GM 8 Dow 9 population of N = 5 sample of n = 2

With-replacement HOW ARE WE SAMPLING ? ATT 12 Sysco 21
Pepsi 42 GM 8 Dow 9 Pepsi 42 population of N = 5 sample of n = 2

With-replacement vs without replacement.
HOW ARE WE SAMPLING ? With-replacement vs without replacement. ATT 12 Sysco 21 Pepsi 42 GM 8 Dow 9 population of N = 5 sample of n = 2

GREAT TRICK : SOME CAVEATS
This sample is obviously “not representative.” ATT 12 Sysco 21 Pepsi 42 GM 8 Dow 9 Sysco 21 Pepsi 42 population of N = 5 sample of n = 2

DOES IT MAKE A DIFFERENCE ?
Rule of thumb: With and without replacement are about the same if root [(N-n) /(N-1)] ~ 1. with vs without SAME ? population of N sample of n

CORRECTION TO PAGE 25 OF TEXT
They would have you believe the population is {8, 9, 12, 42} and the sample is {42}. A SET is a collection of distinct entities. ATT 12 IBM 42 AAA 9 Pepsi 42 GM 8 Dow 9 WE SAMPLE COMPANIES NUMBERS COME WITH THEM Pepsi 42

THE ROLE OF RANDOM SAMPLING
IF THE OVERWHELMING MAJORITY OF SAMPLES ARE “GOOD SAMPLES” THEN WE CAN OBTAIN A “GOOD” SAMPLE BY RANDOM SELECTION.

SELECTING A LETTER AT RANDOM
HOW TO SAMPLE RANDOMLY ? SELECTING A LETTER AT RANDOM Digits are made to correspond to letters. a = b = …. z = 75-77 Random digits then give random letters. … (Table 14, pg. 809) etc… (split into pairs) f t * w etc… (take chosen letters) For samples without replacement just pass over any duplicates.

The Great Trick is far more powerful than we have seen
The Great Trick is far more powerful than we have seen. A typical sample closely estimates such things as a population mean or the shape of a population density. But it goes beyond this to reveal how much variation there is among sample means and sample densities. A typical sample not only estimates population quantities. It estimates the sample-to-sample variations of its own estimates.

EXAMPLE : ESTIMATING A MEAN
The average account balance is $ for a random with-replacement sample of 50 accounts. We estimate from this sample that the average balance is $ for all accounts. From this sample we also estimate and display a “margin of error” $ /- $65.22 = . s denotes "sample standard deviation"

SAMPLE STANDARD DEVIATION
NOTE: Sample standard deviation s may be calculated in several equivalent ways, some sensitive to rounding errors, even for n = 2.

EXAMPLE : MARGIN OF ERROR CALCULATION
The following margin of error calculation for n = 4 is only an illustration. A sample of four would not be regarded as large enough. Profits per sale = {12.2, 15.3, 16.2, 12.8}. Mean = , s = , root(4) = 2. Margin of error = +/ ( / 2) Report: / A precise interpretation of margin of error will be given later in the course, including the role of The interval / is called a “95% confidence interval for the population mean.” We used: ( )2 + ( )2 + ( )2 + ( )2 =

EXAMPLE : ESTIMATING A PERCENTAGE
A random with-replacement sample of 50 stores participated in a test marketing. In 39 of these 50 stores (i.e. 78%) the new package design outsold the old package design. We estimate from this sample that 78% of all stores will sell more of new vs old. We also estimate a “margin of error +/- 11.5% Figured: root(pHAT qHAT)/root(n) =1.96 root( )/root(50) = in Binomial setup

SAMPLING ONLY 600 FROM 500 MILLION ?
A sample of only n = 600 from a population of N = 500 million. (FINE resolution) sample of n = 600 sample mean = 32.84 POP mean = 32.02 FINE resolution densities very close population of N = 500,000 with a sample of n = 600

STATISTICS AND PROBABILITY

Similar presentations

Presentation on theme: "STATISTICS AND PROBABILITY"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

STATISTICS AND PROBABILITY

Similar presentations

Presentation on theme: "STATISTICS AND PROBABILITY"— Presentation transcript:

Similar presentations

About project

Feedback