Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Similar presentations


Presentation on theme: "Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15."— Presentation transcript:

1 Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15

2 Copyright (c) Bani K. Mallick2 Topics in Lecture #15 Some basic probability The binomial distribution Inference about a single population proportions

3 Copyright (c) Bani K. Mallick3 Book Sections Covered in Lecture #15 Chapters 4.7-4.8 Chapter 10.2

4 Copyright (c) Bani K. Mallick4 Lecture 14 Review: Nonparametric Methods Replace each observation by its rank in the pooled data Do the usual ANOVA F-test Kruskal-Wallis

5 Copyright (c) Bani K. Mallick5 Lecture 14 Review: Nonparametric Methods Once you have decided that the populations are different in their means, there is no version of a LSD You simply have to do each comparison in turn This is a bit of a pain in SPSS, because you physically must do each 2-population comparison, defining the groups as you go

6 Copyright (c) Bani K. Mallick6 Categorical Data Not all experiments are based on numerical outcomes We will deal with categorical outcomes, i.e., outcomes that for each individual is a category The simplest categorical variable is binary: Success or failure Male of female

7 Copyright (c) Bani K. Mallick7 Categorical Data For example, consider flipping a fair coin, and let X = 0 means “tails” X = 1 means “heads”

8 Copyright (c) Bani K. Mallick8 Categorical Data The fraction of the population who are “successes” will be denoted by the Greek symbol  Note that because it is a Greek symbol, it represents something to do with a population For coin flipping, if you flipped all the fair coins in the world (the population), the fraction of the times they turn up heads equals 

9 Copyright (c) Bani K. Mallick9 Categorical Data The fraction of the population who are “successes” will be denoted by the Greek symbol  The fraction of the sample of size n who are “successes” is going to be denoted by We want to relate to Let X = number of successes in the sample. The fraction = (# successes)/n = X / n

10 Copyright (c) Bani K. Mallick10 Categorical Data Suppose you flip a coin 10 times, and get 6 heads. The proportion of heads = 0.60 The percentage of heads = 60%

11 Copyright (c) Bani K. Mallick11 Categorical Data The number of success X in n experiments each with probability of success  is called a binomial random variable There is a formula for this: Pr(X = k) = 0! = 1, 1! = 1, 2! = 2 x 1 = 2, 3! = 3 x 2 x 1 = 6, 4! = 4 x 3 x 2 x 1 = 24, etc.

12 Copyright (c) Bani K. Mallick12 Categorical Data 0! = 1, 1! = 1, 2! = 2 x 1 = 2, 3! = 3 x 2 x 1 = 6, 4! = 4 x 3 x 2 x 1 = 24, etc. The idea is to relate the sample fraction to the population fraction using this formula Key Point: if we knew , then we could entirely characterize the fraction of experiments that have k successes

13 Copyright (c) Bani K. Mallick13 Categorical Data The probability that the coin lands on heads will be denoted by the Greek symbol  Suppose you flip a coin 2 times, and count the number of heads. So here, X = number of heads that arise when you flip a coin 2 times X takes on the values 0, 1 and 2 takes on the values 0/2, ½, 2/2

14 Copyright (c) Bani K. Mallick14 Categorical Data: What the binomial formula does The experiment results in 4 equally likely outcomes: each occurs ¼ of the time Tails on toss #1 Heads on toss #1 Tails of toss #2 ¼¼ Heads on Toss #2 ¼¼

15 Copyright (c) Bani K. Mallick15 Categorical Data Heads = “success”: Tails on toss #1 Heads on toss #1 Tails on toss #2 ¼¼ Heads on Toss #2 ¼¼ The binomial formula can be used to give these results without thinking

16 Copyright (c) Bani K. Mallick16 Categorical Data 0! = 1, 1! = 1, 2! = 2 x 1 = 2, 3! = 3 x 2 x 1 = 6, 4! = 4 x 3 x 2 x 1 = 24, etc. n=2, k=1, k! = 1, n! = 2, (n-k)! = 1 The binomial formula gives the answer ½, which we know to be correct

17 Copyright (c) Bani K. Mallick17 Categorical Data Roll a fair dice 123456 First Dice Every combination is equally likely, so what are the probabilities?

18 Copyright (c) Bani K. Mallick18 Categorical Data Roll a fair dice 123456 1/6 First Dice Every combination is equally likely, so what are the probabilities?

19 Copyright (c) Bani K. Mallick19 Categorical Data Roll a fair dice 123456 1/6 First Dice Every combination is equally likely, so what are the probabilities? What is the chance of rolling a 1 or a 2?

20 Copyright (c) Bani K. Mallick20 Categorical Data Roll a fair dice 123456 1/6 First Dice Every combination is equally likely, so what are the probabilities? What is the chance of rolling a 1 or 2? 2/6 = 1/3

21 Copyright (c) Bani K. Mallick21 Categorical Data Now roll two fair dice 123456 1 2 3 4 5 6 Second Dice First Dice Every combination is equally likely, so what are the probabilities?

22 Copyright (c) Bani K. Mallick22 Categorical Data Roll two fair dice 123456 1 1/36 2 3 4 5 6 Second Dice First Dice Every combination is equally likely, so what are the probabilities?

23 Copyright (c) Bani K. Mallick23 Categorical Data Roll two fair dice 123456 1 1/36 2 3 4 5 6 Second Dice First Dice Define a success as rolling a 1 or a 2. What is the chance of two successes?

24 Copyright (c) Bani K. Mallick24 Categorical Data Roll two fair dice 123456 1 1/36 2 3 4 5 6 Second Dice First Dice Define a success as rolling a 1 or a 2. What is the chance of two successes? 4/36 = 1/9

25 Copyright (c) Bani K. Mallick25 Categorical Data Roll two fair dice 123456 1 1/36 2 3 4 5 6 Second Dice First Dice Define a success as rolling a 1 or a 2. What is the chance of two failures? 16/36 = 4/9

26 Copyright (c) Bani K. Mallick26 Categorical Data So, a success occurs when you roll a 1 or a 2 Pr(success on a single die) = 2/6 = 1/3 =  Pr(2 successes) = 1/3 x 1/3 = 1/9 Use the binomial formula: pr(X=k) when k=2 k!=2, n!=2, (n-k)!=1,

27 Copyright (c) Bani K. Mallick27 Categorical Data In other words, the binomial formula works in these simple cases, where we can draw nice tables Now think of rolling 4 dice, and ask the chance the 3 of the 4 times you get a 1 or a 2 Too big a table: need a formula

28 Copyright (c) Bani K. Mallick28 Categorical Data Does it matter what you call as “success” and hat you call a “failure”? No, as long as you keep track For example, in a class experiment many years ago, men were asked whether they preferred to wear boxers or briefs This is binary, because there are only 2 outcomes “success” = ?????

29 Copyright (c) Bani K. Mallick29 Categorical Data Binary experiments have sampling variability, just like sample means, etc. Experiment: “success” = being under 5’10” in height First 6 men with SSN < 5 First 6 men with SSN > 5 Note how the number of “successes” was not the same! (I might have to do this a few times)

30 Copyright (c) Bani K. Mallick30 Categorical Data The sample fraction is a random variable This means that if I do the experiment over and over, I will get different values. These different values have a standard deviation.

31 Copyright (c) Bani K. Mallick31 Categorical Data The sample fraction has a standard error Its standard error is Note how if you have a bigger sample, the standard error decreases The standard error is biggest when  = 0.50.

32 Copyright (c) Bani K. Mallick32 Categorical Data The sample fraction has a standard error Its standard error is The estimated standard error based on the sample is

33 Copyright (c) Bani K. Mallick33 Categorical Data It is possible to make confidence intervals for the population fraction if the number of successes > 5, and the number of failures > 5 If this is not satisfied, consult a statistician Under these conditions, the Central Limit Theorem says that the sample fraction is approximately normally distributed (in repeated experiments)

34 Copyright (c) Bani K. Mallick34 Categorical Data (1  100% CI for the population fraction is by looking up 1  in Table 1

35 Copyright (c) Bani K. Mallick35 Categorical Data Often, you will only know the sample proportion/percentage and the sample size Computing the confidence interval for the population proportion: two ways By hand By SPSS (this is a pain if you do not have the data entered already) Because you may need to do this by hand, I will make you do this.

36 Copyright (c) Bani K. Mallick36 Categorical Data (1  100% CI for the population fraction 95% CI, = 1.96 n = 25, = 0.30

37 Copyright (c) Bani K. Mallick37 Categorical Data (1  100% CI for the population fraction Interpretation?

38 Copyright (c) Bani K. Mallick38 Categorical Data (1  100% CI for the population fraction Interpretation? The proportion of successes in the population is from 0.12 to 0.48 (12% to 48%) with 95% confidence

39 Copyright (c) Bani K. Mallick39 Categorical Data You can use SPSS as long as the number of successes and the number of failures both exceed 5 To get the confidence intervals, you first have to define a numeric version of your variable that classifies whether an observation is a success or failure. You then compute the 1-sample confidence interval from “descriptives” “Explore”: Demo

40 Copyright (c) Bani K. Mallick40 Categorical Data If you set up your data in SPSS, the “mean” will be the proportion/fraction/percentage of 1’s Data = 0 1 1 1 0 0 0 1 0 0 n = 10 Mean = 4/10 =.40 =.40

41 Copyright (c) Bani K. Mallick41 Boxers versus briefs for males In this output, boxers = 1 and briefs = 0

42 Copyright (c) Bani K. Mallick42 Boxers versus briefs for males: what % prefer boxers? In the sample, 46.81%. In the population??? Descriptives.4681 3.649E-02.3961.5401.4645.0000.250.5003.00 1.00 1.0000.129.177 -2.005.353 Mean Lower Bound Upper Bound 95% Confidence Interval for Mean 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis Boxers or Briefs Perference StatisticStd. Error In this output, boxers = 1 and briefs = 0. The proportion of 1’s is the mean

43 Copyright (c) Bani K. Mallick43 Boxers versus briefs for males: what % prefer boxers? Between 39.61% and 54.01% Descriptives.46813.649E-02.3961.5401.4645.0000.250.5003.00 1.00 1.0000.129.177 -2.005.353 Mean Lower Bound Upper Bound 95% Confidence Interval for Mean 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis Gender MaleNumeric Boxers: 0 = Briefs, 1 = Boxers StatisticStd. Error

44 Copyright (c) Bani K. Mallick44 Boxers versus briefs In the sample, 46.81% of the men preferred boxers to briefs: 53.19% preferred briefs. Between 39.61% and 54.01% men prefer boxers to briefs (95% CI) Is there enough evidence to conclude that men generally prefer briefs?

45 Copyright (c) Bani K. Mallick45 Boxers versus briefs In the sample, 46.81% of the men preferred boxers to briefs: 53.19% preferred briefs. Between 39.61% and 54.01% men prefer boxers to briefs (95% CI) Is there enough evidence to conclude that men generally prefer briefs? No: since 50% is in the CI! This means that it is possible (95%CI) that 50% prefer boxers, 50% prefer briefs,  = 0.50.

46 Copyright (c) Bani K. Mallick46 Sample Size Calculations The standard error of the sample fraction is If you want an (1  100% CI interval to be you should set

47 Copyright (c) Bani K. Mallick47 Sample Size Calculations This means that

48 Copyright (c) Bani K. Mallick48 Sample Size Calculations The small problem is that you do not know . You have two choices: Make a guess for  Set  = 0.50 and calculate (most conservative, since it results in largest sample size) Most polling operations make the latter choice, since it is most conservative

49 Copyright (c) Bani K. Mallick49 Sample Size Calculations: Examples Set E = 0.04, 95% CI, you guess that  = 0.30 You have no good guess:


Download ppt "Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15."

Similar presentations


Ads by Google