Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 3: Producing Data

Similar presentations


Presentation on theme: "Chapter 3: Producing Data"— Presentation transcript:

1 Chapter 3: Producing Data
Inferential Statistics Sampling Designing Experiments

2 Inferential Statistics
We start with a question about a group or groups. The group(s) we are interested in is(are) called the population(s). Examples What is the average number of car accidents for a person over 65 in the United States? For the entire world, is the IQ of women the same as the IQ of men? How many times a day should I feed my goldfish? Which is more effective at lowering the heartrate of mice, no drug (control), drug A, drug B, or drug C?

3 Inferential Statistics
Example 1: What is the average number of car accidents for a person over 65 in the United States? How many populations are of interest? One What is the population of interest? All people in the U.S. over age 65.

4 Inferential Statistics
Example 2: For the entire world, is the IQ of women the same as the IQ of men? How many populations are of interest? Two What are the populations of interest? All women and all men

5 Inferential Statistics
Example 3: How many times a day should I feed my goldfish? How many populations are of interest? One What is the population of interest? All pet goldfish

6 Inferential Statistics
Example 4: Which is more effective at lowering the heartrate of mice, no drug (control), drug A, drug B, or drug C? How many populations are of interest? Four What are the populations of interest? All mice taking no drug, all mice taking drug A, all mice taking drug B, all mice taking drug C

7 Inferential Statistics
Suppose we have no previous information about these questions. How could we answer them? Census Advantages We get everyone, we know the truth Disadvantages Expensive, Difficult to obtain, may be impossible. Sample Less expensive. Feasible. Uncertainty about the truth. Instead of surety we may have error.

8 Inferential Statistics
Suppose we have no previous information about these questions. How could we answer them? If we take a census, we have everyone and we have no need for inference. We know. If we take a sample, we make inference from the sample to the whole population. For these four questions, it is not likely we can get a census. We will need to use a sample. Obviously, for each population we are interested in, we must get a separate sample.

9 Inferential Statistics
General Idea of Inferential Statistics We take a sample from the whole population. We summarize the sample using important statistics. We use those summaries to make inference about the whole population. We realize there may be some error involved in making inference.

10 Inferential Statistics
Example (1988, the Steering Committee of the Physicians' Health Study Research Group) Question: Can Aspirin reduce the risk of heart attack in humans? Sample: Sample of 22,071 male physicians between the ages of 40 and 84, randomly assigned to one of two groups. One group took an ordinary aspirin tablet every other day (headache or not). The other group took a placebo every other day. This group is the control group. Summary statistic: The rate of heart attacks in the group taking aspirin was only 55% of the rate of heart attacks in the placebo group. Inference to population: Taking aspirin causes lower rate of heart attacks in humans.

11 Sampling a Single Population
Basics for sampling Sampling should not be biased: no favoring of any individual in the population. Example of a biased sample Select goldfish from a particular store The selection of an individual in the population should not affect the selection of the next individual – independence. Example of non-independent sample Choosing cards from a deck without replacement

12 Sampling a Single Population
Basics for sampling Sampling should be large enough to adequately cover the population. Example of a small sample Suppose only 20 physicians were used in the aspirin study. Sampling should have the smallest variability possible. We know there is some error want to minimize it.

13 Sampling a Single Population
Sampling Techniques Simple Random Sample (SRS): every member of the population has an equal chance of being selected. Population Sample

14 Sampling a Single Population
Sampling Techniques Simple Random Sample (SRS): every member of the population has an equal chance of being selected Assign every individual a number and randomly select 30 numbers using a random number table (or computer generated random numbers). Example: Obtain a list of all SSN for individuals in the U.S. who are over 65. Using a random number table, select 50 of them. Table B at the back of the book is random digits.

15 Sampling a Single Population
Sampling Techniques Stratified Random Sample: Divide the population into several strata. Then take a SRS from each stratum. Strata 1 Population Strata 2 Sample Strata 3

16 Sampling a Single Population
Sampling Techniques Stratified Random Sample Advantage: Each stratum is guaranteed to be randomly sampled Example: Obtain a list of all SSN for individuals in the U.S. who are over 65. Divide up the SSNs into region of the country (time zones). Then randomly sample 30 from each time zone.

17 Sampling a Single Population
Sampling Techniques Cluster Sample: Divide the population into several strata or clusters. Then take a SRS of clusters using all the observations in each. Strata 1 Strata 2 Strata 3 Strata 4 Strata 5 Strata 6 Strata 7 Strata 8 Strata 9 Strata 1 Strata 4 Strata 9 Population Sample

18 Sampling a Single Population
Sampling Techniques Cluster Sample Advantage: May be the only feasible method, given resoures. Example: Obtain a list of all SSNs for individuals in the U.S. who are over 65. Sort the SSNs by the last 4 digits making each set of 100 a cluster. Use a random number table to pick the clusters. You may get the 4100’s, 5600’s and 8200’s for example.

19 Sampling a Single Population
Sampling Techniques Multi-Stage Sample: Divide the population into several strata. Then take a SRS from a random subset of all the strata. Strata 1 Strata 2 Strata 3 Strata 4 Strata 5 Strata 6 Strata 7 Strata 8 Strata 9 Strata 1 Strata 4 Strata 9 Population Sample

20 Sampling a Single Population
Sampling Techniques Multi-Stage Sample Advantage: May be the only feasible method, given resources. Example: Obtain a list of all SSN for individuals in the U.S. who are over 65. Divide up the SSNs into 50 states. Randomly select 10 states. Then randomly sample 40 from each of the selected states.

21 Sampling a Single Population
Sampling Problems Voluntary response Internet surveys Call-in surveys Convenience sampling Sampling friends Sampling at the mall Dishonesty Asking personal questions Not enough time to respond honestly

22 Sampling a Single Population
Undercoverage – Some groups in the population are left out when the sample is taken Nonresponse – An individual chosen for the sample can’t be contacted or does not cooperate Response Bias – Results that are influenced by the behavior of the respondent or interviewer For example, the wording of questions can influence the answers Respondent may not want to give truthful answers to sensitive questions

23 Sampling More than One Population
We sample from more than one population when we are interested in more than one variable. As previously discussed, one variable is chosen to be the response variable and the other is selected as the explanatory variable. Examples Comparing decibel levels of 4 different brands of speakers Determining time to failure of 3 different types of lightbulbs Comparing GRE scores for students from 5 different majors

24 Sampling More than One Population
Example 1: Comparing decibel levels of 4 different brands of speakers What is the explanatory variable? Brand What is the response variable? Decibel Level Number of Populations? Four Number of Samples needed?

25 Sampling More than One Population
Example 2: Determining time to failure of 3 different types of lightbulbs What is the explanatory variable? Type What is the response variable? Time to Failure Number of Populations? Three Number of Samples needed?

26 Sampling More than One Population
Example 3: Comparing GRE scores for students from 5 different majors What is the explanatory variable? Major What is the response variable? GRE score Number of Populations? Five Number of Samples needed?

27 Sampling More than One Population
Important Considerations Each sample should represent the population it corresponds to well. Samples from more than one population should be as close to each other in every respect as possible except for the explanatory variable. Otherwise we may have confounding variables. Two variables are confounded if we cannot determine which one caused the differences in the response.

28 Sampling More than One Population
Important Considerations Examples of Confounding Suppose we compared the decibel levels of the four different speaker brands, each with a different measuring instrument We wouldn’t know if the differences were due to the different brands or different instruments. Brand and Instrument are then confounded. Suppose we compared the time to failure of the three different types of lightbulbs, each in a different light socket. We wouldn’t know if the differences were due to the different types of lightbulbs or different light sockets. Type and Socket confounded.

29 Sampling More than One Population
Important Considerations Examples of Confounding Suppose we obtained GRE scores for each major, each from a different university. We wouldn’t know if the differences were due to the different majors or different universities. Major and University are then confounded. Confounding can be avoided by using good sampling techniques, which will be explained shortly

30 Sampling More than One Population
Important Considerations It is also possible that more than one (possibly several) explanatory variable can influence a given response variable. Example Perhaps both the type of lightbulb and the type of light socket influence the time to failure of a lightbulb. It is likely that different types of lightbulbs work better for different sockets. This concept is known as interaction. Interaction: The responses for the levels of one variable differ over the levels of another variable.

31 Sampling More than One Population
Randomized Experiment The key to a randomized experiment: the treatment (explanatory variable) is randomly assigned to the experimental units or subjects. Random Assignment Control Population Statistics Simple Random Sample Compare Treatment

32 Sampling More than One Population
Randomized Experiment Example: Suppose that before we want to test the effect of aspirin on the physicians, we wish to do a study on the effect of aspirin on mice, comparing heart rates. We obtain a random sample of 100 mice. We randomly assign 50 mice to receive a placebo. We randomly assign 50 mice to receive aspirin. After 20 days of administering the placebo and aspirin, we measure the heart rates and obtain summary statistics for comparison.

33 Sampling More than One Population
Randomized Experiment The single greatest advantage of a randomized experiment is that we can infer causation. Through randomization to groups, we have controlled all other factors and eliminated the possibility of a confounding variable. Unfortunately or perhaps fortunately, we cannot always use a randomized experiment Often impossible or unethical, particularly with humans.

34 Sampling More than One Population
Observational Study We are forced to select samples from different pre-existing populations Pre-existing Population 1 Pre-existing Population 2 Statistics Simple Random Sample Compare

35 Sampling More than One Population
Observational Study Advantage: The data is much easier to obtain. Disadvantages We cannot say the explanatory variable caused the response There may be lurking or confounding variables Observational studies should be more to describe the past, not predict the future. Case-Control Study: A study in which cases having a particular condition are compared to controls who do not. The purpose is to find out whether or not one or more explanatory variables are related to a certain disease. Although you can’t usually determine cause and effect, these studies are more efficient and they can reduce the potential confounding variables.

36 Sampling More than One Population
Observational Study Example 1: Suppose we are interested in comparing GRE scores for students in five different majors We cannot do a randomized experiment because we cannot randomly assign individuals to a specific major. The individuals decide that for themselves. Thus, we observe students from 5 different pre-existing populations: the five majors. We obtain a random sample of size 15 from each of the five majors. We calculate statistics and compare the 5 groups. Can we say being in a specific major causes someone to get a higher GRE score? What are some possible confounding variables? How might we reduce the effect of these confounding variables?

37 Sampling More than One Population
Observational Study Example 2: Suppose we are interested finding out which age group talks the most on the telephone: 0-10 years, years, years, or years We cannot do a randomized experiment because we cannot randomly assign individuals to an age group. Thus, we observe (through polling or wire tapping) individuals from 4 different pre-existing populations: the four age groups. We obtain a random sample of size 25 from each of the four age groups. We calculate statistics and compare the 4 groups. Can we say being in a specific age group causes someone to talk more on the telephone? What are some possible confounding variables? How might we control these confounding variables?

38 Inference Overview Recall that inference is using statistics from a sample to talk about a population. We need some background in how we talk about populations and how we talk about samples.

39 Inference Overview Describing a Population
It is common practice to use Greek letters when talking about a population. We call the mean of a population m. We call the standard deviation of a population s and the variance s2. When we are talking about percentages, we call the population proportion p. (or pi). It is important to know that for a given population there is only one true mean and one true standard deviation and variance or one true proportion. There is a special name for these values: parameters.

40 Inference Overview Describing a Sample
It is common practice to use Roman letters when talking about a sample. We call the mean of a sample . We call the standard deviation of a sample s and the variance s2. When we are talking about percentages, we call the sample proportion p. There are many different possible samples that could be taken from a given population. For each sample there may be a different mean, standard deviation, variance, or proportion. There is a special name for these values: statistics.

41 Inference Overview We use sample statistics to make inference about population parameters Population Sample Mean: Standard Deviation: Proportion: m s s p p

42 Sampling Variability There are many different samples that you can take from the population. Statistics can be computed on each sample. Since different members of the population are in each sample, the value of a statistic varies from sample to sample.

43 Sampling Distribution
The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population. We can then examine the shape, center, and spread of the sampling distribution.

44 Bias and Variability Bias concerns the center of the sampling distribution. A statistic used to a parameter is unbiased if the mean of the sampling distribution is equal to the true value of the parameter being estimated. To reduce bias, use random sampling. The values of a statistic computed from an SRS neither consistently overestimates nor consistently underestimates the value of the population parameter. Variability is described by the spread of the sampling distribution. To reduce the variability of a statistic from an SRS, use a larger sample. You can make the variability as small as you want by taking a large enough sample.

45 Bias and Variability


Download ppt "Chapter 3: Producing Data"

Similar presentations


Ads by Google