# Where do data come from and Why we don’t (always) trust statisticians.

## Presentation on theme: "Where do data come from and Why we don’t (always) trust statisticians."— Presentation transcript:

Where do data come from and Why we don’t (always) trust statisticians.

Induction vs. Deduction the gist of statistics
Deduction: “What is true about the whole, must be true about a part.” Induction: “What is true about the part might be true about the whole.”

Population vs. Sample Population is the entire group of individuals about which we want information. Sample is a part of population from which we actually collect information. We use samples to study population because, often, populations are impossible or impractical to study.

Real Life Example of a Bad Sample
Ann Landers, a famous columnist, collected a sample of 10,000 people who wrote in to answer this question: “If you could do it all over again, would you have children?” 70% of the respondents said that they would not have children. When a sample was selected at random, 91% of the people said that they would have children.

Potential problems with sample surveys
Undercoverage occurs when some groups in population are left out of the process of choosing the sample. Nonresponse occurs when an individual chosen for the sample cannot be contacted or refuses to respond.

Another Real life Example of a Bad Sample
In 1936 Literary Digest mailed out 10,000,000 ballots asking who the respondents are going to vote for – A. Landon or F.D. Roosevelt. 2,300,000 ballots were returned, predicting a strong win (57%) for Landon.

Another Real life Example of a Bad Sample
George Gallup surveyed 50,000 people chosen randomly. Comparison of forecasts: Gallup’s Prediction for Roosevelt 56% Gallup’s prediction of Digest 44% Digest prediction for Roosevelt 43% Actual vote 62% Literary Digest used their subscription list, phone directory, lists of car owners, club members.

Right and Wrong Ways to Sample
A simple random sample is a sample where (1) each unit of population has an equal chance of being chosen and (2) all units are chosen independently. The sample is biased if at least one group of individuals has greater chances of being selected.

Example of a good sample
You want to study effects of computers on GPA. You don’t have the resources to study all students. To select a sample of students for the study you Get a list of all students, Select at random students on the list, Collect information from the students selected, Compare those who have computer with those who don’t.

Example of a bad sample You want to study effects of computers on GPA. You don’t have the resources to study all students. To select a sample of students for the study you Use your friends. Hang an ad in the computer lab. Post an on-line questionnaire on WKU site.

Stratified Random Sample
When we know proportions of each group in the population – Stratified random sample is better than SRS. In stratified sample, number of people chosen from each group is proportional to the size of that group in the population.

Confounding Two explanatory variables are confounded when their effects on the response variable cannot be distinguished from each other. Confounding is often a problem with a study that uses sample surveys to collect data (even if sampling is done right).

Observation vs. Experiment
Observational study - observes individuals and measures variables but does not attempt to influence responses. Experiment imposes treatment on individuals to observe their responses.

How to design an Experiment
The purpose of an experiment is to find out how one variable (response variable) changes in response to change in another variable (explanatory variable). Experiment: Subject Treatment Response

Placebo Effect Placebo effect – change in behavior due to participation in experiment. Placebo effect is a problem when experiment does not have a control group (a basis for comparison) To avoid the problem – design a randomized comparative experiment.

How to design a Randomized Comparative Experiment
Randomly split the subjects into two groups: control group – receives no treatment treatment group – receives treatment Compare the results. Both will be equally affected by Placebo effect, so the difference between the groups shows whether the treatment works.

How to interpret results of an experiment
Observe outcomes for treatment and control groups. If outcomes are different enough so that we can say that this difference would rarely occur by chance, we conclude that the difference is statistically significant.

Population vs. Sample Population is the entire group of individuals about which we want information. Sample is a part of population from which we actually collect information. Based on the sample, we make conclusion about the whole population.

Parameter vs. Statistic
A Parameter is the number that describes the population. A Statistic is a number that describes the sample. We use statistics to estimate parameters.

Sampling Distribution
The result of your study is a statistic, which can vary from sample to sample Sampling Distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population Estimate=True Parameter + Sampling Error

Bias and variability A statistic is biased if the mean of the sampling distribution is not equal to the true value of the parameter being estimated. Variability of a statistic is the spread of sampling distribution. Bias does not go away with larger samples. Variability goes away with larger samples.

Similar presentations