# Data Collection Jan 28,2014 Math 119 - Fall 2011 1.

## Presentation on theme: "Data Collection Jan 28,2014 Math 119 - Fall 2011 1."— Presentation transcript:

Data Collection Jan 28,2014 Math 119 - Fall 2011 1

 Identify the population in a sampling situation  Recognize bias due to sampling methods  Recognize sources of errors in a sample survey Math 119 - Fall 2011 2

 Presidential election between Franklin D. Roosevelt (D) and Alfred Landon (R).  Before the election, Literary Digest magazine conducted an opinion poll of the voting population. Its survey predicted that Landon would win the 1936 election, and this was widely reported ◦ sampling was done by phone calls  most home owners with telephones were Republicans  Roosevelt won convincingly Math 119 - Fall 2011 3

 Observational Study ◦ researchers simply observe characteristics and take measurements  can reveal association, not causation  Designed Experiment ◦ researchers impose treatments and controls and THEN observe characteristics and take measurements  can help establish causation Math 119 - Fall 2011 4

 Vasectomies and Prostate Cancer ◦ 450,000 performed each year in US  tube carrying sperm from testicles cut and tied  Study by E. Giovanucci ◦ 113 cases of prostate cancer per 22,000 men with vasectomies ◦ 70 per 22,000 is expected rate  study shows ~60% elevated risk, revealing an association, but it does not establish cause Math 119 - Fall 2011 5

 Folic Acid and Birth Defects (study by Czeizel and Istvan Dudas)‏  4,753 women divided into two groups ◦ One group took daily multivitamins containing 0.8 mg of folic acid ◦ other group received only trace elements  Drastic reduction in the rate of major birth defects ◦ 13 per 1,000 vs 23 per 1,000 Math 119 - Fall 2011 6

 If we had simply done a survey and asked women if they took supplements, the explanatory variables (folic acid consumption) might be confounded. ◦ women who would voluntarily choose to take vitamins might generally make healthier decisions and exercise more often  Healthier decisions CONFOUND the impact of folic acid on birth defects Math 119 - Fall 2011 7

 Population ◦ group of individuals from whom we wish to get more information; typically not able to assess directly  Sample ◦ a subset of the group of population  Sampling Design ◦ the method by which we choose the subset Math 119 - Fall 2011 8 A parameter is a number describing a characteristic of the population. A statistic is a number describing a characteristic of a sample.

 Whether an observational study or an experiment is used to collect data, the data has to be representative of the population.  Let’s look at methods by which data is collected. Math 119 - Fall 2011 9

 Random Sample members of the population are selected in such a way that each individual member has an equal chance of being selected. (Contrast this with voluntary & convenience.) Definitions  Simple Random Sample (of size n ) subjects selected in such a way that every possible sample of the same size n has the same chance of being chosen * I.e., sample 10 people to determine voter preference. Select 10 from font of room? Put names in a hat? Whichever 10 are chosen, should be equally representative. (Not convenient or voluntary)

Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Random Sampling selection so that each individual member has an equal chance of being selected

Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Systematic Sampling Select some starting point and then select every k th element in the population

Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Convenience Sampling use results that are easy to get

Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Stratified Sampling subdivide the population into at least two different subgroups that share the same characteristics, then draw a sample from each subgroup (or stratum)

Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Cluster Sampling divide the population into sections (or clusters); randomly select some of those clusters; choose all members from selected clusters. Each cluster should be a small scale representation of the total population.

Math 119 - Fall 2011 16

 The Current Population Survey (CPS)‏ ◦ http://www.census.gov/cps http://www.census.gov/cps  monthly survey of about 60,000 households  the sample is scientifically selected to represent the civilian population ◦ employment status of each member of household ◦ data is used to make model-based estimates for individual states and other geographic areas ◦ estimates obtained from the CPS include employment, unemployment, earnings, hours of work, et al  available by age, sex, race, marital status, educational attainment, school enrolment ◦ used by policymakers and legislators as important indicators of our nation's economic situation Math 119 - Fall 2011 17

Math 119 - Fall 2011 18

Math 119 - Fall 2011 19

Math 119 - Fall 2011 20

 Among the factors that contributed to the decrease in the percentage of family households with children under 18:  Increases in longevity — The average numbers of years of life remaining at age 30 increased about three years, comparing those age 30 in 1960 with baby boomers who turned 30 in 1980 (Table 11 [PDF], U.S. Life Tables, National Center for Health Statistics). As adults live longer, a larger proportion of married couple households will be those who are older and either childless, or whose adult children live elsewhere. In 1968, 29 percent of married men were age 55 and over, as were 22 percent of married women. In 2008, 38 percent of married men were 55 and over, as were 33 percent of married women.Table 11  Increases in childlessness — The percentage of women age 40 to 44 who were childless increased from 10 percent in 1976 to 20 percent in 2006. (Supplemental Table 1 [Excel], U.S. Census Bureau).Supplemental Table 1  Other highlights from America’s Families and Living Arrangements: 2008 include:  The median age for men at first marriage was 27.4 years. For women, the median age at first marriage was 25.6.  The percentage of family households with children under 18 in 2008 that had three or more of their own children present was 21 percent in both 1998 and 2008.  The percentage of adults ages 45 to 49 who were married varied by race and ethnicity. For example, among women 45 to 49, 79 percent of Asians, 69 percent of white non-Hispanics, 62 percent of Hispanics and 43 percent of blacks were married.  In 2008, 66.9 million opposite-sex couples lived together — 60.1 million were married, and 6.8 million were not.  The United States had an estimated 5.5 million “stay-at-home” parents: 5.3 million mothers and 140,000 fathers.  The percentage of children living with two parents varied by race and origin. Eighty-five percent of Asian children lived with two parents, as did 78 percent of white non-Hispanic children, 70 percent of Hispanic children and 38 percent of black children.  About 9 percent of all children (6.6 million) lived in a household that included a grandparent. Twenty-three percent of children living with a grandparent had no parent present.  In 2008, 6 percent of white non-Hispanic children lived in a household with a grandparent present, compared with 10 percent of Hispanic children, and 14 percent of both Asian and black children. Math 119 - Fall 2011 21

 Bureau of Economic Analysis (bea.gov)bea.gov  http://www.bea.gov/regional/gsp/action.cfm http://www.bea.gov/regional/gsp/action.cfm Math 119 - Fall 2011 22 GDP: the market value of all final goods and services made within the borders of a nation in a year