Presentation on theme: "Social Statistics Estimation and complex survey design Ian Plewis, CCSR, University of Manchester."— Presentation transcript:
Social Statistics Estimation and complex survey design Ian Plewis, CCSR, University of Manchester
Social Statistics PARAMETER: Population Mean, Probability, Variance etc.: μ, π, σ² ESTIMATOR: Sample Mean, Proportion, Variance etc.:, p, s2 ESTIMATE: Usually a number calculated from the observed data. We combine the estimate and its standard error to make inferences about the parameter.
Social Statistics With simple random sampling, the standard error of is σ n usually estimated by s n where n is the sample size. And the standard error of p is usually estimated by.
Social Statistics MILLENNIUM COHORT STUDY Stratification The population was stratified by UK country - England, Wales, Scotland and Northern Ireland. For England, the population was then stratified, via the stratification of electoral wards extant on 1 April 1998, into three strata: 1) The 'ethnic minority' stratum: children living in wards which, in the 1991 Census of Population, had an ethnic minority indicator of at least 30%. 2) The 'disadvantaged' stratum: children living in wards, other than those falling into stratum (1) above, which fell into the upper quartile (i.e. the poorest 25% of wards) of the ward-based Child Poverty Index. 3) The 'advantaged' stratum: children living in wards, other than those falling into stratum (1) above, which were not in the top quartile of the CPI. Advantaged is therefore a relative term in this context. For Wales, Scotland and Northern Ireland, there were just two strata: disadvantaged and advantaged.
Social Statistics Clustering The wish to bring the broader socio-economic context into the analysis, particularly as represented by the areas or local neighbourhoods that the children live in, and the need to keep field costs down, led to the decision to cluster the sample. Moreover, the chosen method of stratification - by characteristics of electoral wards - meant that using wards, rather than alternative geographical aggregates such as postcode sectors, was the most appropriate way to implement the clustering. In addition, the issues of measuring local context and reducing fieldwork costs pointed to the advantages of including all births in selected wards in the sample, rather than sub- sampling within wards.
Social Statistics The sample is a disproportionately stratified cluster sample. The disproportionality means that the sample is not self-weighting and so weighted estimates of means, variances etc. are needed. The clustering implies that observations are not independent and so allowance must be made for the dependence so induced when sampling errors are computed. It was likely that the design effects for the sample would be greater than one. In other words, the sample would be somewhat less precise than a simple random sample of the same size would have been, although this depends on how far the gains from stratification and systematic selection are offset by the losses from clustering which, in turn, would vary across measures.
Social Statistics Sampling Fractions and Weights across Strata
Social Statistics The weights should be applied when estimating a mean for the sample, say. In other words, the weighted mean for variable y is: where i (i = 1..x h ) indexes the elements in a stratum so x h is the sample size for stratum h. The appropriate weights w h from the table should be used.
Social Statistics Sampling Errors, Proportion of Cohort Members Regarded as Non-white by Stratum and Country
Social Statistics Sampling Errors, Proportion of Natural Mothers who do not have a Longstanding Illness by Stratum and Country
Social Statistics Sampling Errors, Family Income, Natural Mothers by Stratum and Country
Social Statistics Representation of design effects in STATA MEFF: ratio of variance for all aspects of study design including all weights to variance if the sample were a srs of the same size. MEFT: (MEFF) DEFF: ratio of variance for all aspects of sample design to variance if the sample were a srs of the same size. DEFT: (DEFF) See Kreuter, F. and Valliant, R. (2007) STATA Journal, 7, 1-21
Social Statistics Types of populations The MCS population definition is a finite one; we could in principle count all its members. The distinction between a finite and an infinite population is an important one for statistical inference and analysis. Finite populations are often regarded as samples from an infinite or super-population or universe. Inferences about finite populations are essentially descriptive. They are widely used in sample surveys, especially by official statisticians. A perfect Census of Population, in these circumstances, is not a sample: the resulting inferences are made with complete certainty. Descriptive inferences from finite populations are often known as design-based inferences.
Social Statistics Often, we are more interested in analytic or model-based inferences. We want to test a hypothesis, for example, or we want to estimate a relation between two or more variables. We are then more interested in an infinite or super-population: As a basis for scientific generalizations and decisions, a census is only a sample…A census describes a population that is subject to the variations of chance because it is only one of the many possible populations that might have resulted from the same underlying system of social and economic causes. A sample enquiry is then a sample of a sample, and a so-called 100% sample is simply a larger sample, but is still only a sample. (quotes from Deming and Stephan, 1941, p.48).
Social Statistics Suppose we have data obtained by measuring the elements.. in a probability sample.. drawn from the finite population U. Two important types of inference are as follows: a.Inference about the finite population U itself. b.Inference about a model or a superpopulation thought to have generated U. (Särndal et al., 1992) Providing we have selected our sample from the finite population using, as a selection mechanism, probability sampling (the simplest case of which is simple random sampling) then we can rely on the idea of repeatedly sampling in the same way from the population in order to generate what is known as a randomisation distribution which we can use to make inferences about a parameter of interest. A probability sample is one in which each member of the finite population has a known and non-zero chance of being selected into the sample. In simple random sampling, these known selection probabilities are equal.
Social Statistics A probability sample is not essential for model-based inference. Instead, we rely on being able to define the statistical or probability model that generated the data and then using the likelihood of the parameter given the model and the data for estimation. On the other hand, a probability sample from a finite population guards against bias and is usually regarded as desirable. Finite populations are able to be defined in an unambiguous way. Super-populations, although conceptually important, are abstractions that are less easily defined. They will usually extend the finite population across space and time.
Social Statistics Classic reference: Kish, L. (1965) Survey Sampling. New York: Wiley. More recent reference: Särndal, C.-E., Swensson, B. and Wretman, J. (1992) Model Assisted Survey Sampling. New York: Springer.