Presentation is loading. Please wait.

Presentation is loading. Please wait.

Review Law of averages, expected value and standard error, normal approximation, surveys and sampling.

Similar presentations


Presentation on theme: "Review Law of averages, expected value and standard error, normal approximation, surveys and sampling."— Presentation transcript:

1 Review Law of averages, expected value and standard error, normal approximation, surveys and sampling

2 Basic concepts

3 Example According to genetic theory, there is very close to an even chance that both children in a two-child family will be of the same sex. Which is more likely? (i) 15 couples have two children each. In 10 or more of the families, both children are of the same sex. (ii) 30 couples have two children each. In 20 or more of the families, both children are of the same sex. Answer: (i). With more families, the percentage is more and more close to the even chance (50%), then 10/15 = 20/30 = 2/3 is less likely to happen.

4 Example A die will be thrown some number of times, and the object is to guess the total number of spots. There is a one-dollar penalty for each spot that the guess is off. For instance, if you guess 200 and the total is 215, you lose $15. Which do you prefer: 50 throws, or 100? Answer: 50. The best number to be guessed is the expected value. Then the larger number of throws, the chance error is likely to be larger (you lose more money).

5 Remark The SE for number and SE for percentage behave quite differently: The SE for number will go up like the square root of the number of draws. The SE for percentage will go down like the square root of the number of draws.

6 Basic concepts What is a probability histogram? A graph represents probability/chance, not data. What is the relationship between the empirical histogram for the observed data and the ideal probability histogram? If the chance process is repeated many times, the empirical histogram converges to the probability histogram. In general, the process is about the sum of draws. What if the process is about the product of draws? The convergence still applies.

7 Basic concepts What is the central limit theorem? When drawing at random with replacement from a box, the probability histogram for the sum will follow the normal curve, provided that the histogram must be put into standard units and the number of draws must be reasonably large. What if the process is about the product of draws? The convergence fails to apply.

8 Basic concepts What is a population? What is a sample? In a survey, a population is the group of subjects that we want to study. A sample is part of the population. It will represent some properties of the population. We study samples, when it is impractical to study the whole population. What is a parameter? A parameter is a numerical fact about a population. Usually a parameter cannot be determined exactly, but can only be estimated.

9 Basic concepts What is a statistic? A statistic is an estimate to the parameter, and it can be computed from a sample. A statistic is what we know. The parameter is what we want to know. What are the two main bias we studied in class? The selection bias and the non-response bias.

10 Basic concepts How do we determine if there is selection bias in a survey? There is discretion on the part of interviewers, there is discretion on the part of investigator or survey designer, the process does not involve probability theory so that the chance for each individual is not even, and so on. How do we determine if there is non-response bias in a survey? The life style of the non-respondents can be very different from the respondents, we may also calculate the non-response rate: personal interviews 65% and mailed questionnaires 25%. (threshold)

11 Basic concepts What is the best method to draw a sample in a survey? The probability methods. What is the simplest probability method? The simple random sampling. Is it practical? No. The length of the name list is too long, it is not easy to send out interviewers to find the selected individuals, and so on. What other probability methods we have studied? Multistage cluster sampling and random digit dialing(RDD) from telephone survey.

12 Basic concepts According to the equation: statistic = parameter + chance error, what is sample percentage and what is population percentage in a sampling process? Do they have to be equal? Population percentage is the parameter or the expected value. Sample percentage is the statistic or the estimate, and it is often off by a chance error which is measured by SE for percentage. According to the square root law, what determines the accuracy of the sampling process? When the sample is only a small part of the population, it is the sample size which mainly determines the accuracy. The population size has almost no influence on it.

13 Calculation and formula

14

15 Models and Examples

16

17 Normal approximation

18

19 Models and Examples Sampling process: A group of 50,000 tax forms has an average gross income of $37,000, with an SD of $20,000. About 20% of the forms have a gross income over $50,000. A group of 900 forms is chosen at random for audit. Q1: estimate the probability that between 19% and 21% of the forms chosen for audit have gross incomes over $50,000. Q2: estimate the probability that the total gross income of the audited forms is over $33,000,000. (This question can be also translated into average version: the average gross income is over $33,000,000/900.)

20 Solutions

21

22

23 Example Suppose in a calculus test, a group of 10,000 students has an average score of 70, with an SD of 10. A group of 400 students is chosen for sampling. Q: Assume the scores follow the normal distribution, can we estimate the probability that between 14% and 18% of the students chosen for sampling have scores above 80? Answer: Yes, we can!

24 Solution In the box model, we have 10,000 tickets, average = 70, SD = 10, 400 draws. This problem is about determine whether each student have a score above 80 or not. So it is a counting process. We need a new box model: 0-1 box. There are 10,000 tickets. But we don’t know the composition percentage of 1’s and 0’s. We first have to use the normal curve to estimate it. Because we have the assumption that the data follow the normal curve. In the original box, with average = 70, SD = 10, the score 80 is converted to 1 in standard units. From the normal table, to the right of 1, it is about 16%. That is, in the population of 10,000 students, there are about 16% of the students have scores above 80. So in the new 0-1 box, 16% are 1’s, rest are 0’s.

25 Solution

26 Good Luck!


Download ppt "Review Law of averages, expected value and standard error, normal approximation, surveys and sampling."

Similar presentations


Ads by Google