Presentation is loading. Please wait.

Presentation is loading. Please wait.

Producing Data: Sample and Experiments. Does my mommy really love me? An advice columnist, Ann Landers, once asked her readers, “If you had it to do.

Similar presentations


Presentation on theme: "Producing Data: Sample and Experiments. Does my mommy really love me? An advice columnist, Ann Landers, once asked her readers, “If you had it to do."— Presentation transcript:

1 Producing Data: Sample and Experiments

2

3 Does my mommy really love me? An advice columnist, Ann Landers, once asked her readers, “If you had it to do over again, would you have children?” A few weeks later, her column was headlined, “70% OF PARENTS SAY KIDS NOT WORTH IT.” Indeed, 70 % of the 10,000 respondents said they would not have children.

4 Designing Samples Population This is who we are trying to study. We usually can’t get everyone, though. Sample A part of the population that represents the whole. What is a true sample? Is our class a sample that represents the school?

5 Types of Samples Census When you can survey/test everyone in the population. Voluntary Response (Self-Selected) Sample When people choose whether or not to respond. American Idol Mail home survey Convenience Sample When you survey/test those easiest to reach. Taking a survey in the quad at lunch. Quota Sample When you hand pick a group that seems to match your population Probability Sample Each member of the population has a known probability of being in the sample.

6 Probability Sampling Simple Random Sample (SRS) A sample of size n so that every set of n individuals is equally likely to be chosen. This is the “best” type of sampling. Systematic Sample Picking every nth individual Every third person that comes through the door will win a prize. This is random, but it is not an SRS. The first two people through the door can’t both win. Stratified Random Sample Subgroups (strata) are picked that are similar in some way and then individuals are chosen out of the group. They can be split up by proportion If 55% of the population is female, then I will make sure that my sample is 55% female. This is beneficial if you want to represent certain groups of a population, or you need to make sure a certain group is represented For example, in a large group you might have a 2% population of native americans, but you might not get a large group if you took a SRS. You would want to make sure you get them in your sample by doing a stratified sample.

7 Other Sampling Methods Cluster (Area) Sampling The population is split into clusters and only certain clusters are studied to get a feel for the population. If I want to get a feel for town governments, an SRS will cause us to have to do too much travelling. So, we randomly choose five counties (these are your clusters) and then study every town government in those counties. This saves us travelling time, but still gives us a random sample of the population.

8 More Sampling Methods Multistage Sampling This is when you use a sampling or combination of sampling methods more than once to get a sample of the population. If I want to interview Ca resident, I might do a cluster sample to pick different counties, and then a SRS to pick individuals. This is a two-stage sample If I want to study US seniors we might do a stratified random sample to get districts of certain demographics, then do an SRS to get a smaller number of schools, then do an SRS of seniors in those schools. This is a three-stage sample

9 Sampling Bias Bias A design is biased if it systematically favors a certain outcome. Undercoverage This is when certain groups are left out of the sample. A telephone survey can have undercoverage, because people without phones aren’t included. How is a systematic sample biased? Most samples, no matter how good, suffer from some undercoverage Nonresponse An individual can’t be contacted or refuses to participate This occurs if I randomly call 100 houses, but only 50 are reached or only 50 agree to participate Reponse Bias Occurs if a respondent gives false answers, they can’t understand the question, they want to please the interviewer, or the ordering of the question favors and answer. Wording Bias The wording of a question affects the outcome. “Don’t you think the driving age should be raised to 18 since teenagers are so reckless?”

10 Using a Table of Random Digits When you pick a SRS, you need to be Random. We can use a table of random digits (Table B) Assign a numerical label to every individual. Make sure that every individual has the same number of digits. Don’t do 0001-1000 because then you have to use four-digit numbers. Instead use 000-999. Use table B to select at random.

11 Using Your Calculator Go to MATH/PRB Choose randInt randInt(1, 100, 23) You will randomly pick 23 numbers between 1 and 100. ranInt(0, 99, 45) You will randomly pick 45 numbers between 0 and 99

12

13 Observational Study vs. Experiment An observational study observes and records behavior but does not impose a treatment. I’m going to take a survey to see how many students drink energy drinks. An experiment is a study in which the researcher imposes some sort of treatment. I want to determine the effects of energy drinks on hours of sleep. So, I’m going to give some students energy drinks and the others aren’t allowed to drink energy drinks. The difference is that an experiment is imposed on experimental units and an observational study is no

14 Experimental units and treatments An experimental unit on which a treatments is being imposed. An experimental unit is called a subject if it is a person. A treatment is a specific experimental condition applied to the experimental units. Two different individuals in an experiment might get two different treatments. One might get an energy drink, another might not. To find the number of treatments when there is more than one variable you use the multiplication principle For example, I am testing based on energy drinks and number of classes So, there are two treatments in energy drinks (yes or no) and 3 in number of classes (4, 5, 6), which means that there are six total treatments

15 Explanatory and Response Variables An explanatory variable is what is being implemented. This is the amount of caffeine given or dosage of blood pressure medicine. Each explanatory variable is referred to as a factor. A factor can have different levels. In our drink and classes experiment there are two factors Energy drink and number of classes Two levels in one factor (yes/no) and three in the other (4, 5, 6) creates six different treatments. A Response Variable is what is being measured This would be blood pressure or the number of hours of sleep. An experiment usually is trying to determine if or how the explanatory variable affects the response variable.

16 Principles of Experimental Design There are three principle of experimental design: 1. Control 2. Randomization 3. Replication

17 1. Control The biggest aspect of the actual experiment is whether or not you are controlling the lurking variables and confounding Is it the treatment that is affecting the response variable or is it something else? Lurking Variables are those that are not among the explanatory and response variables but can influence results Many experiments are controlled with a placebo Half of the class will get the love potion while the other half gets sugar water. This way we know if it’s the love potion or just a new found confidence. Controlling experiments reduces the chances of confounding Confounding occurs when you cannot distinguish if the explanatory variable is causing an affect or if another variable is also adding to it

18 Controlling Bias You can avoid some personal bias by blinding experiments. All experiments should at least be single blind The subjects should not be aware that their treatment is different than someone else’s. You don’t tell the subject her dosage is higher. In order to avoid bias from the person implementing the experiment, it can be made double blind. In this case the implementer and the subjects are not aware of the differences in the treatments. The doctor does know if he is giving medicine or a placebo?

19 2. Randomization How are you picking your units/subjects? You want to equalize groups so that lurking variables will be equal among the different groups. We want to make the groups as equal as possible except for difference in treatments. If I were to study heart medicine I wouldn’t put all the people who have had heart attacks in one group. I would want them to be in both groups. You can use the different methods of sampling in order to create randomization.

20 3. Replication The more units/subjects I have the better. The bigger the number, the more likely you are to have a representation of the population. This reduces bias or systematic favoritism. I don’t have to run the experiment more than once. I just need to have a lot of experimental units.

21

22 Completely Randomized Design A completely randomized design takes a random sample from the population that we are trying to study. This is like a SRS. In a completely randomized design each treatment is unique and independent from the other Example I want to test the affects of energy drinks and number of classes on sleep. I have created six treatment groups based on the two factors. I put the names of the 300 high school students that have volunteered in a hat. The first fifty names pulled will be in the yes/4 group, the next 50 in the yes/5 group, and so on. We will measure every individuals sleeping patterns for a month and then compare.

23 Block Design A block design separates the population into blocks and tests them individually. This is the same as a stratified random sample. We could create gender blocks of men and women. Each block receives the exact same treatments. Although it is nice, blocks do not have to be the same size. We can have 55 men and 45 women. Example Using the same information on energy drinks from the previous slide, I will split up the 300 volunteers into two groups based on gender. I will then take all the men and randomly put them into six groups (one for each treatment) using a SRS and run the experiment as before. I will then take the women and put them into six groups (one for each treatment) using a SRS and run the experiment. I will collect data for a month and then compare the results.

24 Matched Pairs A matched pairs design is a type of block design that compares only two treatments. I will have several pairs of fish tanks in different parts of the room. One gets one fish food, one gets the other. In this case the different parts of the room are the blocks. You can also have one subject get both treatments. Which is better, Dr. Pepper or Diet Dr. Pepper. In this case, each individual is the block. Example I want to determine if a new type of bicycle tire will last longer than the other. I have found 100 bicyclists and asked them to take one new tire and one old tire. 50 of them will put the new tire on the front and old on the back, and the other 50 will do the opposite. We will measure each tire on a 10 point scale and find the difference between the new and old (n – o), and review our results.

25 What the data looks like Completely Randomized and Block designs You will have at least two lists of data, one for each treatment group In our example, the group that had the energy drink and four classes should have 50 pieces of data measuring each individuals average hours of slep during that month y/4—{7.1, 8.0, 6.8, …} y/5—{7.0, 8.0, 6.6,…} y/6—{6.8, 7.1, 7.2…} … Matched Pairs Since we are comparing two treatments in individual blocks, we will be looking at one list of data, usually representing a difference In our example with the tires, we would have 100 numbers representing the difference (New – Old) from each biker’s tires Difference—{1.0, 0.5, -0.2, 0.0, 2.0…}

26

27 Simulations You can run simulations the same way that do a SRS. I want to run a simulation of picking ten people where 53% are men and 47% are women. 00-52 represent men; 53-99 represent women 01-53 represent men; 54-99, 00 represent women I can use table B or randint on my calculator. How many women were picked in this simulation?

28

29 Know how to take a SRS from a large population Observational study Put a name in a hat for every individual from a population and choose n individuals Assign every individual in the population a number and use a RNG or a table of random digits to pick n people Experiment Put all the experimental units names/assigned #s in a hat, the first n/2 you pull go into one group, the remaining go in the other group For every individual, we flip a coin, if it’s heads they go into one group, if it’s tails it goes in the other group. Once one group fills n/2, the remaining individuals go in the other group You have to make sure that the individuals are chosen in a random order. You would not want to go through students in order of grade in a class, because the last students would all be put into a group, but they are all the student with the lowest grades For every individual, roll a die. If it’s a 1 o2 they go into one group… As with the coin, you have to make sure that individuals are chosen in a random order

30 Know who you can draw conclusions about You can only draw conclusions about the group from which you drew your sample If I took 100 random student from CV, I could only draw conclusions about CV students, not students If I took 100 students from California, I could only draw conclusions about students from California, not the nation It also does not matter how many you take as long as it’s random If I randomly chose 5 students form CV, I could make a conclusion about students form CV as long as it’s random It does not matter how large the sample is We will talk about the setbacks of small sample second semester


Download ppt "Producing Data: Sample and Experiments. Does my mommy really love me? An advice columnist, Ann Landers, once asked her readers, “If you had it to do."

Similar presentations


Ads by Google