Presentation on theme: "Sampling designs using the National Pupil Database Some issues for discussion by Harvey Goldstein (University of Bristol) & Tony Fielding (University of."— Presentation transcript:
Sampling designs using the National Pupil Database Some issues for discussion by Harvey Goldstein (University of Bristol) & Tony Fielding (University of Birmingham)
Size of data set The data set already contains some 3000k longitudinal records and increases by 600k a year. To carry out reasonably complex analyses, e.g. value added multilevel models, is already time consuming. Worth investigating the efficiency of sampling the database – either as a whole or for specific subpopulations such as LEAs. Traditional sampling theory can be used for simple statistics such as means or regression coefficients, and there is a literature for power calculations for multilevel models (see ESRC research project by Browne at Nottingham)
Special features of the NPD The population characteristics are known and can be used for drawing efficient samples. The possibility of an adaptive design exists, e.g.: –Select a random subsample to determine relationships of interest (equivalent of a pilot study) –Fit a suitable model to estimate parameter values –Choose parameters of interest together with their confidence intervals –Increase sample size to establish relationship between CI and sample size and extrapolate to sample size needed to achieve required interval size. –Any statistic of interest (in additon to CI) can be chosen.
Complex designs and replication For multilevel models and designs where interest focuses on special groups (e.g. low achievers) we need good choices of numbers of higher level units (schools) and numbers in the groups. A similar adaptive approach can be used, evaluating CIs or significance levels as design parameters are altered. We also have the opportunity of replicating an analysis by selecting an independent sample from the database.
Using all the data When analysing a given sample we will also generally have available data related to the sample members, e.g.: –School level averages for each pupil in a study –School level data for previous schools attended –School level data for previous years –LEA data for previous years –School data for neighbouring schools, All such data can be incorporated into a model, increasing the number of variables but not the sample size.
Other possibilities Poststratification: using population distributions to re-weight statistics or to incorporate weights in model estimation. Setting up an archive of results that may be useful for designing samples Using PLASC to select a research sample – subject to appropriate permissions.