Presentation is loading. Please wait.

Presentation is loading. Please wait.

Where Do Data Come From? ● Conceptualization and operationalization of concepts --> measurement strategy --> data. ● Different strategies --> different.

Similar presentations


Presentation on theme: "Where Do Data Come From? ● Conceptualization and operationalization of concepts --> measurement strategy --> data. ● Different strategies --> different."— Presentation transcript:

1 Where Do Data Come From? ● Conceptualization and operationalization of concepts --> measurement strategy --> data. ● Different strategies --> different ways of obtaining data. ● Closer look at these different ways of collecting/producing data ● The main distinction is between observational data and experimental data ● Important ideas: Sample, Population, Simple random sample, Bias.

2 Where Do Data Come From? ● Observational Study – Observes individuals (units of analysis) and measures variables of interest – Unobtrusive; does not attempt to influence the responses ● Experiment – Deliberately imposes some “treatment” on the individuals – Studies whether the treatment causes change in the response – Not always feasible. e.g. 1. Does living close to hazardous waste sites affect children's academic performance? e.g.2. Does gender explain voting behavior?

3 Observational Study (?) Notz01-C01

4 Observational Data Examples ● Sample Surveys. Observations on samples of a population. e.g. – Public opinion polls (such as the Gallup polls) ● Population: U.S. Residents 18 years of age and over ● Sample: between 1000 and 1500 people – The Current Population Survey (unemployment and many other economic and social variables) ● Population: U.S. Households ● Sample: About 60,000 households interviewed each month. – The General Social Survey (personal and family background, experiences and habits, attitudes and opinions) ● Population: Adults (>=18) living in U.S. Households ● Sample: about 3000 interviewed in person at home

5 Observational Data Examples ● Census: Tries to survey the entire population – Why census? Need block-by-block data to create election districts. – Issues: ● Very expensive, in both time and money ● Data can be noisier than from samples ● Can only attempt to sample the entire population. 2000 census missed.12% of the population, including 2.78% of the black population. – Reform: the “long form” has been replaced by a sample survey (“American Community Survey”)

6 Experiments ● Do something to the individuals to see the causal effect of the “treatment” ● e.g., Job training for welfare mothers. – Under voluntary participation, self selection can be a problem, so that the training group may differ systematically from the control group (in motivation, education, attitude, etc.) – Randomized experiments help in finding the answer – Estimate average effects. Individual level effect not possible. Why? (fundamental problem of causal inference) – Issues: ethical considerations, compliance, etc. – More in the lecture on experiments

7 Samples, Good and Bad 1 9 2 345 6 7 8 10 17 16 15 13 14 12 11 Sample How to draw samples?

8 Samples, Good and Bad ● Bad: sample does not represent the population well; Systematically bias the results – Convenience sampling ● Select individuals based on convenience (e.g. Mall samples; Classroom samples); include those easiest to reach in the sample – Voluntary response sampling ● allowing individuals to choose to be in the sample (write-in or call-in opinion polls) ● e.g. Ann Landers asked readers of her advice column whether they'd have children if they were to do it all over again. 10,000 responded, with 70% saying “No!” Representative of the general population? Who likely responded? Who read her columns to start with? (See cartoon)

9 “Would you have children if you could do it all over again?” Notz02_C01

10 Samples, Good and Bad ● Good: Simple random samples – Get rid of the personal choice element in convenience and voluntary samples. – Choosing a sample by impersonal chance. Ensure that every individual in the population has equal chance of being selected into the sample – Idea: “Drawing names from a hat” – How to do it? Make use of random numbers ● Look up a random number table. But better yet: ● Use software. e.g. http://www.randomizer.org/http://www.randomizer.org/ – Label the individuals, from 1 to N (say 1 to 340) – Decide the sample size k (say 34) – Software randomizes your list through sorting it by uniform random numbers. After the list is randomized the first k could be used. Try it.

11 Samples, Good and Bad ● When you encounter any sample (such as in reading about a Gallup poll story), ask whether it's a “good” sample. Is it representative of the intended population? Does every individual have equal chance of being selected? ● Look for the magical words “randomly selected” (or something equivalent) ● Things don't always go as planned. Complications in executing a random sampling plan such as non- response can cause the resulting sample to be non- random. Missing data

12 Samples Good and Bad: More Examples Free internet polls, e.g. http://www.misterpoll.comhttp://www.misterpoll.com Pick a poll and take a look at the results. e.g., “secret spying”, “Obama's progress so far” Do you trust the results? (What's the sample size? Who can vote? Can one vote twice? More? Are the questions well designed?)

13 Samples Good and Bad: More Examples Many television stations conduct polls by asking viewers to call one phone number if they feel one way about an issue and a different phone number if they feel the opposite. Does the result of such a poll represent the feelings of the community? Or does it represent the feelings of all those watching the TV station at the time? Or some other group?

14 Samples Good and Bad: More Examples Suppose you had a telephone directory listing all of the businesses in a city, alphabetized by type of business. If you wanted to phone 100 of them in order to get a representative sampling of opinion on some issue, would it be a good idea to simply use the first 100 businesses listed?


Download ppt "Where Do Data Come From? ● Conceptualization and operationalization of concepts --> measurement strategy --> data. ● Different strategies --> different."

Similar presentations


Ads by Google