Presentation is loading. Please wait.

Presentation is loading. Please wait.

Section 1.1 The Structure of Data.

Similar presentations


Presentation on theme: "Section 1.1 The Structure of Data."— Presentation transcript:

1 Section 1.1 The Structure of Data

2 Hollywood Movies Do movies that are comedies tend to get higher audience ratings than movies that are dramas? In a dataset to answer this question, what are the cases? Comedies Dramas Movies Audience ratings We are collecting data about movies, so the cases are the movies.

3 Hollywood Movies Do movies that are comedies tend to get higher audience ratings than movies that are dramas? In a dataset to answer this question, how many variables are there? 1 2 3 4 5 There are two variables: Whether the movie is a comedy or a drama, and what the audience rating is for the movie.

4 Hollywood Movies Do movies that are comedies tend to get higher audience ratings than movies that are dramas? In a dataset to answer this question, how many of the variables are categorical? 1 2 Whether the movie is a comedy or a drama is categorical.

5 Hollywood Movies Do movies that are comedies tend to get higher audience ratings than movies that are dramas? In a dataset to answer this question, how many of the variables are quantitative? 1 2 Audience rating is quantitative.

6 Counties with the highest kidney cancer death rates
For fun, ask students to hypothesize about kidney cancer risk factors Counties with the highest kidney cancer death rates Source: Gelman et. al. Bayesian Data Anaylsis, CRC Press, 2004.

7 Counties with the lowest kidney cancer death rates
Key: sample size. Smaller counties are more likely to have more extreme death rates – tell them they will learn about this. Counties with the lowest kidney cancer death rates

8 Kidney Cancer If the values in the kidney cancer dataset are rates of kidney cancer deaths, then what are the cases? The people living in the US The counties of the US A person either has kidney cancer or doesn’t… a rate must apply to a group of people, such as a county

9 Kidney Cancer If the values in the kidney cancer dataset are yes/no, then what are the cases? The people living in the US The counties of the US A person either has kidney cancer or doesn’t. Yes/no doesn’t make sense for a county.

10 Kidney Cancer If the cases in the kidney cancer dataset are counties, then the measured variable is… Categorical Quantitative Rates are numbers (quantitative).

11 Kidney Cancer If the cases in the kidney cancer dataset are people, then the measured variable is… Categorical Quantitative Either having kidney cancer or not is categorical.

12 Let’s Collect Some Data!
Using Data to Answer a Question QUESTION: If you are romantically interested in someone, should you be obvious about it, or should you play hard to get? Let’s Collect Some Data!

13 Romance What type of person are you generally more romantically interested in? Someone who is obviously into you Someone who plays heard to get Ask them what else they may want to know…

14 Romance MALES ONLY: What type of person are you generally more romantically interested in? Someone who is obviously into you Someone who plays heard to get Ask them what else they may want to know…

15 Romance FEMALES ONLY: What type of person are you generally more romantically interested in? Someone who is obviously into you Someone who plays heard to get Ask them what else they may want to know…

16 Sampling from a Population
Section 1.2 Sampling from a Population

17 Review We want to know what percentage of people are vegetarian. The cases are people. The measured variable is Categorical Quantitative Each person is either a vegetarian or not a vegetarian.

18 Most Important to You Which of the following is most important to you?
Athletics Academics Social Life Community Service Other

19 Most Important to You Suppose researchers studying student life use the results of our clicker question to investigate what students find important What is the sample? What is the population? Can the sample data be generalized to make inferences about the population? Why or why not?

20 Non-Random Samples Suppose you want to estimate the average number of hours that students spend studying each week. Which of the following is the best method of sampling? Go to the library and ask all the students there how much they study all students asking how much they study, and use all the data you get Give a clicker question in this class and force every student to respond Stand outside the student center and ask everyone going in how much they study Note: students will probably give a variety of answers, making a good opportunity for rich discussion

21 Context “If you had it to do over again, would you have children?
Ann Landers column asked readers “If you had it to do over again, would you have children? The first request for data contained a letter from a young couple which listed worries about parenting and various reasons not to have kids 30% said “yes” The second request for data was in response to this number, in which Ann wrote how she was “stunned, disturbed, and just plain flummoxed” 95% said “yes”

22 Having Children If we were to run the question all by itself in the newspaper with a request for responses, could we trust the results? Yes No This would suffer from volunteer bias. We need a random sample.

23 Having Children Newsday conducted a random sample of all US adults, and asked them the same question, without any additional leading material 91% said “yes” Do you think the true proportion of US parents who are happy they had children is close to 91%? Yes No Because this is a random sample, the population proportion should be close to the sample proportion.

24 Summary Always think critically about how the data were collected, and recognize that not all forms of data collection lead to valid inferences This is the easiest way to instantly become a more statistically literate individual!

25 Experiments and Observational Studies
Section 1.3 Experiments and Observational Studies

26 Review To estimate the proportion of students who support a smoke-free campus, you compute the proportion that say yes after responding to an sent to all students asking “Do you support a smoke-free campus?” The data collected is Not biased Biased because of wording bias Biased because asked over instead of in person Biased because responses may be inaccurate Biased because volunteer samples are almost always biased

27 Causal Association? “Daily Exercise Improves Mental Performance”
The wording of this headline implies… Association (not necessarily causal) Causal Association This implies that exercising daily will improve (change) your mental performance

28 Causal Association? “Want to lose weight? Eat more fiber!”
The wording of this headline implies… Association (not necessarily causal) Causal Association This implies that eating fiber will cause you to lose weight.

29 “Cat owners tend to be more educated than dog owners”
Causal Association? “Cat owners tend to be more educated than dog owners” The wording of this headline implies… Association (not necessarily causal) Causal Association There is no claim that owning a cat will change your education level.

30 College Education and Aging
“Education seems to be an elixir that can bring us a healthy body and mind throughout adulthood and even a longer life,” says Margie E. Lachman, a psychologist at Brandeis University who specializes in aging. For those in midlife and beyond, a college degree appears to slow the brain’s aging process by up to a decade, adding a new twist to the cost-benefit analysis of higher education — for young students as well as those thinking about returning to school.” A Sharper Mind, Middle Age and Beyond -NY Times, 1/19/12 Are you convinced that a college degree slows the brain’s aging? Yes No People who go to college may be different to begin with!

31 Single-Sex Dorms The president of a large university recently announced that the school would be switching to all single-sex dorms. He cites studies stating that, in universities that offer both same-sex and co-ed housing, students in co-ed dorms report hooking up for casual sex more often Can we conclude from these studies that this new policy will reduce the number of student hook-ups? Yes No Confounding variable: desire for casual sex and/or choosing to live in a co-ed dorm Source: Stepp, “Single-sex dorms won’t stop drinking or ‘hooking up’,” June 16, 2011

32 It’s a Common Mistake! “The invalid assumption that correlation implies cause is probably among the two or three most serious and common errors of human reasoning.” - Stephen Jay Gould

33

34 Exercise and the Brain A study found that elderly people who walked at least a mile a day had significantly higher brain volume (gray matter related to reasoning) and significantly lower rates of Alzheimer’s and dementia compared to those who walked less The article states: “Walking about a mile a day can increase the size of your gray matter, and greatly decrease the chances of developing Alzheimer's disease or dementia in older adults, a new study suggests.” Is this conclusion valid? (a) Yes (b) No Observational study – cannot yield causal conclusions. Allen, N. “One way to ward off Alzheimer’s: Take a Hike,” msnbc.com, 10/13/10.

35 Exercise and the Brain A sample of mice were divided randomly into two groups. One group was given access to an exercise wheel, the other group was kept sedentary “The brains of mice and rats that were allowed to run on wheels pulsed with vigorous, newly born neurons, and those animals then breezed through mazes and other tests of rodent IQ” compared to the sedentary mice Is this evidence that exercise causes an increase in brain activity and IQ, at least in mice? (a) Yes (b) No Randomized experiment– can yield causal conclusions. Reynolds, “Phys Ed: Your Brain on Exercise", NY Times, July 7, 2010.

36 Knee Surgery for Arthritis
Researchers conducted a study on the effectiveness of a knee surgery to cure arthritis. It was randomly determined whether people got the knee surgery. Everyone who underwent the surgery reported feeling less pain. Is this evidence that the surgery causes a decrease in pain? (a) Yes (b) No Need a control or comparison group. What would happen without surgery?

37 Green Tea and Prostate Cancer
A study was conducted on 60 men with PIN lesions, some of which turn into prostate cancer Half of these men were randomized to take 600 mg of green tea extract daily, while the other half were given a placebo pill The study was double-blind, neither the participants nor the doctors knew who was actually receiving green tea After one year, only 1 person taking green tea had gotten cancer, while 9 taking the placebo had gotten cancer

38 Green Tea and Prostate Cancer
A difference this large is unlikely to happen just by random chance. Can we conclude that green tea really does help prevent prostate cancer? (a) Yes (b) No Good randomized experiments allow conclusions about causality.

39 Type of Randomization To see if people read faster on paper or a kindle, a study was done in which 16 people read two sets of instructions of similar length, one on a kindle and one on paper. The order in which they read the instructions was randomized. (Reading was faster on paper.) This is a Observational study Randomized comparative experiment Matched pairs experiment Concatenated experiment

40


Download ppt "Section 1.1 The Structure of Data."

Similar presentations


Ads by Google