Presentation on theme: " No matter how sophisticated the statistical techniques may be, reasonably accurate inferences about a population’s characteristics can’t be made without."— Presentation transcript:
No matter how sophisticated the statistical techniques may be, reasonably accurate inferences about a population’s characteristics can’t be made without a sample that is: representative of the population and is large enough. What is inferential statistics?
Producing Data Anecdotal evidence: based on haphazardly selected cases which often come to our attention because they are striking in some way. Available data: data that were produced in the past for some other purpose but that may help answer a present question. Statistical vs. non-statistical designs for producing data: e.g., statistical vs. ethnographies or other case studies. Statistical designs for producing data: these rely on either experiments or observation based on sampling.
Sampling: selects part of a population (a sample) to gain information about the whole population. A sample survey is a kind of observational study because it does not attempt to impose treatments that influence responses. Census: attempts to collect data on every individual in a population. What are the advantages of a sample survey versus a census?
Experiments: deliberately impose some treatment on individuals to observe their responses. Experimental design, when carried out correctly, is the most effective of all research designs in controlling the effects of lurking variables. Why? Because experimental design is intrinsically comparative.
But experimental results typically cannot be generalized. Why not?
The Principles of Experimental Design
The main principles of experimental design are control, randomization, & replication.
A fundamental research challenge is to minimize bias. Bias: systematic over or under estimation of values—the systematic favoring of higher or lower values. Bias usually can’t be detected by inspecting a given data set by itself. Why not? Properly done, experimental design is the most effective way of minimizing bias.
Control: comparative design makes it more likely—when carried out correctly—that influences other than the experimental variable operate equally on all experimental units (or subjects). That is, it makes more it likely that lurking variables operate equally on all experimental units. Control, then, reduces bias.
Randomization (i.e. randomized assignment): enables impersonal chance to allocate experimental units (or subjects) to the treatment or control groups. If done correctly, it makes more likely that influences other than the experimental variable operate equally on all experimental units (or subjects). Randomization, then, reduces bias, as does comparative design.
Replication: make sure that the number of experimental observations is large enough to permit a result that is statistically significant (i.e. an observed effect so large that it would rarely occur by chance).
Why, though, can’t experimental results typically be generalized? Because a properly conducted experiment randomizes assignment to the treatment and control groups, but it does not randomly sample from a wider population.
Summary: Experimental Design What are the basic features of experimental design? How is experimental design intrinsically comparative? In what ways is it the most effective research design? What is statistical significance (versus practical or theoretical significance)?
Potential Limitations of Experimental Designs What are the potential limitations of experimental designs? What critical questions need to be asked when assessing an experimental study?
The Principles of Sample Design
Population: the entire group of individuals (i.e. phenomena or entities) that we are trying to understand. Sample: the part of the population that we actually examine in order to obtain information. Sample design: the method used to choose the sample from the population. Inadequate sample design & execution lead to misleading conclusions by creating bias in the selection of observations. Sample Design
There are two kinds of error: (1) Bias: non-random (i.e. systematic) error—favors either higher or lower values. (2) Variability: random error—just as likely to involve higher as lower values & thus has neutral affect on central tendency. We want to minimize variability, but bias is a greater concern: Why?
The Two Basic Types of Samples Voluntary (i.e. non-probability) sample: consists of people who choose themselves by responding to a general appeal. Not a probability sample & thus highly biased. Probability sample: chosen by impersonal chance. We must know what samples are possible & what chance, or probability, each possible sample has. If conducted correctly, bias is minimized.
Basic Types of Probability Sampling Design Sampling relies on a sample frame: a list of elements from which a probability sample is drawn. Simple random sample: consists of n individuals from the population chosen so that every set of n individuals (i.e. units) has an equal chance to be the sample actually selected. Systematic random sample: randomize the list of elements to choose from; select a random number between 1 and 10; and then choose every kth element for inclusion in the sample. The result is virtually identical to random sampling.
Stratified random sample: (1) Divides the population into groups of similar individuals (i.e. phenomena or entities), called strata. (2) Chooses a separate simple random sample within each stratum & combines them to form the full sample.
Stratified random sampling is based on an exhaustive list of the target population. The sample is proportional if the proportions of the sample chosen in the various strata are the same as those existing in the population. Required statistical correction: see Stata manual.
According to Freedman et al., Statistics, we shouldn’t exaggerate the benefits of stratification in reducing a sample’s variance.
Multistage cluster sample: (1) Selects successively smaller groups (such as geographic units) within the population in stages, resulting in a sample consisting of clusters (i.e. groups) of individuals (i.e. phenomena or entities). (2) Samples from the clusters (i.e. not all clusters end up providing samples observations for the study). (3) Samples observations only from within the sample-selected clusters.
Each stage in a multi-stage cluster sample may employ a simple random sample, a stratified random sample, or some other type of sample. Sample observations are drawn from within the sample-selected clusters only, not from within every cluster. Cluster sampling is used when it’s impossible or impractical to compile or observe an exhaustive list from the target population.
Stratified vs. Cluster Sampling In stratified sampling, a sample is drawn within every stratum, and the strata are the groups compared. In cluster sampling, clusters are ways of identifying groups of observations, but a sample is not drawn from within every cluster. See, e.g., Agresti & Finlay, Statistical Methods for the Social Sciences, pages
Multistage cluster sample example – U.S. Census: Divide U.S. into geographic areas within states: primary sampling units (PSUs). Divide each PSU into smaller geographic units- -census blocks—then stratify the blocks by ethnic and other data. Take a stratified sample of census blocks. Sort each census block’s housing units into clusters of four nearby units. Interview the households in a probability sample of these clusters.
While a multistage cluster sample is the most commonly used sophisticated sample, it involves a key statistical problem: the observations within each sampled cluster tend to be more alike than are the observations between the sampled clusters. This is because such sampling violates an assumption of inferential statistics: that the individuals (i.e. units or observations) are sampled not only randomly but also independently from each other. This reduces the amount of statistical information about the sample’s variability.
Required statistical correction: In Stata, type ‘help svy’.
Sources of Non-Probability Bias in Probability Samples Even well-designed & well-executed probability samples can suffer from bias due to non-probability problems of:
Undercoverage: the sample frame (i.e. list of elements from which the probability sample is drawn) does not adequately cover all relevant categories of elements. Nonresponse: especially if non-randomly distributed. Response bias: due to traits/behavior of the interviewer/researcher or respondent/subject Poorly worded questions or problems due to order of questions
Descriptive vs. Inferential Statistics Descriptive statistics: summarizes the data. Inferential statistics: makes inferences from a sample to a population.
Toward Statistical Inference Statistical inference: based on impersonal chance, we use data on sampled individuals (i.e. phenomena or entities) to infer conclusions about the wider population. Parameter: a number that describes a population. It is a fixed number, but in practice we usually don’t know its value. Statistic: a number that describes a sample. A statistic’s value is known when we’ve taken a sample, but it can vary from sample to sample. We often use a statistic to estimate an unknown parameter. This is called inferential statistics.
A parameter is what we want to know about in a population: e.g., we want to know about the cholesterol levels of adults between the ages of 25 & 64 in South Florida. A statistic is what we’ll learn from a random sample: e.g., the cholesterol levels of randomly sampled adults between the ages of 25 & 64 in South Florida.
Sampling variability: the value of a statistic varies with repeated random sampling of the same size from the same population. All of statistical inference is based on one idea: to see how trustworthy a procedure is, ask what would happen if it were repeated independently many times. See Freedman et al., Statistics.
That is, ask what would happen if we took many independent random samples of the same size from the same population. Take a large number of random samples of the same size from the same population. Compute the mean for each sample. Make a histogram of the values of the sample means. Examine the distribution’s shape, center, & spread.
E.g., a medical researcher wants to estimate the cholesterol levels of South Florida adults ages Let’s say that, as a start, the researcher measures the cholesterol values of 500 randomly sampled adults ages (based on their places of residence & conducting a door-to-door random-sample survey).
This, however, is only one sample. What would happen if we repeated the random sample independently over & over again with the same size & in the same population?
The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible random samples of the same size from the same population. It’s a way of conceptualizing what distribution would emerge if we could see all possible samples of the same size-n from the same population.
Regarding the cholesterol study, what if the researcher could draw all possible random samples of n=100 from South Florida’s population of adults ages 25-64? What would be the shape, center, & spread (including outliers) of the resulting distribution of cholesterol means from each of the random samples, lined up together?
Thinking about sampling distribution helps us to understand what we’re trying to accomplish in drawing one or more actual random samples of size-n from the same population: big picture versus little picture. Later in the course we’ll see its crucial importance for tests of statistical significance.
How do we begin to connect the little picture of the actual random samples of size-n to the big picture of the conceptual ideal of the sampling distribution? We do so by using a histogram to describe the means of each actual random sample of size-n, lined up together.
E.g., regarding the cholesterol levels of the South Florida population of adults ages , the researcher obtains a huge amount of funding to take 1000 random samples of size 100. The variable, of course, is cholesterol value: in each random sample of size 500, what’s the mean cholesterol level?
Each time we take a random sample of 100, we compute the sample’s mean & standard deviation for cholesterol level. And as we accumulate samples, we examine them together in one histogram to find out how much the sampled means of cholesterol either converge toward the center or spread out.
At last, we have our total of 1000 random samples of size 100 from the South Florida population of adults ages According to the histogram, what’s the shape, center, & spread (including outliers) of the mean number of high-cholesterol persons for each of the 1000 random samples of size 100, lined up together?
In short, we draw a random sample to obtain a statistic that we’ll use to estimate the unknown parameter (i.e. the unknown population value). Our objective is not to do descriptive statistics, but rather to do inferential statistics.
Before we move on, what could we do to make the distribution of sample means less variable (i.e. reduce the distribution’s standard deviation)?
Make sample size-n notably larger. Let’s say that he researcher manages to do a survey of 100, but then gets funding to re-do the survey increasing the sample size-n to What’s the difference in sample mean variability between the size 100 samples & the size 2500 samples?
Two Fundamental Problems We have to contend with two fundamental problems in drawing samples & making inferences: Bias Variability
Bias Bias means that a measurement systematically underestimates or overestimates a parameter. A statistic used to estimate a parameter is unbiased if the mean of its sampling distribution is equal to the true value of the parameter being estimated.
Chance errors change from measurement to measurement, sometimes up & sometimes down. Chance errors are random errors—and thus don’t affect a distribution’s central tendency. Bias, however, affects all the measurements in the same direction, either up or down. Bias, then, is systematic error, which pushes a distribution’s central tendency either up or down.
What would be evidence of chance (i.e. random) errors in the cholesterol study? What would be evidence of bias in the cholesterol study? What might be sources of such bias?
Chance (i.e. random) error is also called sampling (i.e. non- systematic) error. Bias is also called non- sampling (i.e. systematic) error. See King et al., Designing Social Research, on bias.
Variability The variability of a statistic is described by the spread of its sampling distribution. This spread is determined by the sampling design & the sample size n. Statistics from larger samples have smaller spreads. Variability refers to whether the estimated values fall within a relatively narrow range or are more widely scattered.
Bias & variability
See King et al. on bias versus variability (or ‘efficiency’).
Managing Bias & Variability To reduce bias use random sampling. To reduce the variability of a statistic in simple random sampling, use a (much) larger sample.
A large random sample almost always gives an estimate that is close to the parameter: the larger, the better. What matters is the sample size, not the population size: The variability of a statistic from a random sample does not notably depend on the size of the population. According to Moore/McCabe/Craig, this is true, in the strictest sense, as long as the population is at least 100 times larger than the sample, but for basic purposes it is true in general. The larger the random sample, the less variable—i.e. the more precise—the sample statistic will be.
Freedman et al., Statistics, say: “When estimating percentages, it is the absolute size of the sample which determines accuracy, not the size relative to the population. This is true if the sample is only a small part of the population, which is the usual case” (p. 367). There is a very tiny difference, which the finite population correction factor (fpc) can compensate for: perhaps use when the sample is a large share (say, %) of the population—but using it can cause extra uncertainty for inferring the sample’s results to a population (Stats mercy.org/stats/size/population.asp; and Carolina Population Center ial/example29.htm).http://www.childrens- mercy.org/stats/size/population.asp ial/example29.htm So use fpc only when descriptive precision, rather than inference, is the priority. See Freedman et al., pp ; and UCLA-ATS efault.htm efault.htm
fpc = square root of (N – n/N – 1) N=population sizen=sample size “If fpc is close to 1, then there is almost no effect. When fpc is much smaller than 1, then sampling a large fraction of the population is indeed having an effect on precision” (Stats mercy.org/stats/size/population.asp).http://www.childrens- mercy.org/stats/size/population.asp In Stata: ‘help svy’; & see Stata manual for svy commands
The fpc for different situations: Table’s examples: “When the sample size is 50, it does not matter much whether the population is 10 thousand or 10 million. When the sample size, however, is four thousand, then we have about 23% more precision with a population of ten thousand than we would for a population of ten million” (Stats mercy.org/stats/size/population.asp).http://www.childrens- mercy.org/stats/size/population.asp
To repeat, possibly use fpc when sample is a large share of population (see Carolina Population Center, (http://www.cpc.unc.edu/services/computer/presentations/statatutorial/exa mple29.htm).http://www.cpc.unc.edu/services/computer/presentations/statatutorial/exa mple29.htm But, recall, using it may cause extra uncertainty if you seek to infer the sample’s results to a population. And keep in mind that the statistical difference is slight, so it’s generally not a big deal: use fpc only when the priority is descriptive precision rather than inference. Returning to the general issue: Freedman et al. acknowledge that the relationship of sample size and accuracy to population is counterintuitive. Helpful analogy: “Every cook knows that it only takes a single sip from a well-stirred soup to determine the taste" (Stats mercy.org/stats/size/population.asp).http://www.childrens- mercy.org/stats/size/population.asp
Remember: the larger the random sample, the less variable—i.e. the more precise—the statistic. Required sample size has virtually nothing to do with population size. The basic rule, then: obtain the largest random sample possible.
Why Use Randomized Assignment or a Random Sample? Randomized assignment or a random sample guarantee that the results of analyzing our data are subject to the laws of impersonal probability. Nonetheless, keep in mind that proper statistical design is not the only aspect of a good sample or experiment. The sampling distribution says nothing about possible bias due to non-sampling problems such as undercoverage, nonreponse, or response bias.
Consequently, the true distance of a statistic from the parameter it is estimating can be much larger than the sampling distribution suggests. Moreover there is no way to say how large the added error is. These are non-probability sources of the fact that conclusions are inherently uncertain.
See King et al. on the importance of reporting statistical uncertainty.
Review What’s randomized assignment and random sampling? What are their purpose? What are the principal kinds of research design? What are the procedures in each kind? What are the advantages & disadvantages of each kind? What’s a population, a sample & a sample design? What’s measurement error? What’s sampling variability? What’s the basic approach of statistical inference?
What’s statistical significance? What’s a population distribution? What’s bias? What’s variability? How can we reduce bias? How can we reduce variability, & what’s the relation of such action to population size? What are the basic kinds of sample? What are the advantages & disadvantages of each kind? Besides lack of randomness, what other problems can bias a statistic?
Example: Sampling FIU’s administration hires you to find out the proportion & characteristics of FIU students who are principal caregivers for children or elders.
Will you do a census or survey, & why? Assuming that you do a survey, what kind of sample design do you choose, & why?
What sampling-based problems may cause bias or unacceptable variability? What will you do about them (or would you like to do about them)?
In Stata, type ‘help svy’ to inspect a suite of survey-statistic adjustments. See Stata manual.
How to draw a random sample with Stata Draw a 50% random sample. use hsb2, clear. set seed 123[to make the sample replicable]. summarize. sample 50. summarize, detail. Use, e.g., histogram & boxplot to graph the sampled observations.
Draw a random sample of 50 observations. use hsb2, clear. summarize. set seed 123 [to make the sample replicable]. sample 50, count. summarize, detail. Use, e.g., histogram or boxplot to graph the sampled observations.