Presentation on theme: "BHPS User Group 19 January 2001. Overview News on progress with data availability and access. Uses of the new sub-samples and weighting Open session -"— Presentation transcript:
Overview News on progress with data availability and access. Uses of the new sub-samples and weighting Open session - issues of interest, (note that we should be dealing in part with use of histories and family data in the afternoon) Our aim over the day is a mixture of presentations and response to queries, but also to gather information on how we can improve user support (e.g. training)
Training programme Currently 2 week BHPS data confrontation workshop in Essex summer school A new programme of two day courses, mixture of: –basic introduction, –more specialist in relation to particular research interests –Also targeted courses in Scotland and Wales (and Newcastle and Manchester?) We are currently planning basic course in Colchester in May Also note User Conference in July
Using BHPS with multiple samples BHPS had simple design up to wave six From Wave Seven ECHP From Wave Nine Scotland and Wales From Wave Eleven ? New samples mean researchers need to ask themselves additional questions in sub-setting data It was always necessary to ask the following: –Which cases? –Which waves? –What final structure (e.g. pooled or cross-wave match)? –Should analysis be weighted, using which weights?
BHPS data - basic design Standard set of files for each wave - following basic questionnaire structure –Households –All Individuals –Respondent Individuals –Additional repeating group data - income sources, jobs Naming is consistent across waves, except for wave prefix - adding an additional wave is usually a simple replication Match between wave using wHID, wPNO etc Match across waves using PID
How the new samples fit into the design New samples are incorporated into the standard file structure Generally identical to cases from the first sample in data provided - (except e.g. new entrant data) They can be identified using variables: –wHHORIG - household level –wMEMORIG - individual –MEMORIG - cross-wave files
ECHP sub-sample Starts wave seven, when BHPS replaced former UK-ECHP Sub-sample consists of the surviving Northern Ireland sample, and a Great Britain sub-sample, to over-represent low income households. Selected on the basis of ECHP wave 3 proxy measures (income was not yet available): –HRP unemployed now or in last year –HRP receiving Lone parent benefit –Rented housing –Means-tested welfare benefit. At Wave 7 - 1710 respondents (235 in Northern Ireland)
Scotland and Wales extension samples ESRC response to devolution Now (probably) funded to 2003 First Wave in 1999/2000 - Wave 9 BHPS Sample structure similar to BHPS wave one, sub-regional stratification, Highlands and Islands Scotland: 1459 households, 2407 respondents Wales: 1428 households, 2430 respondents. Additional questions on national identity etc. Release with main BHPS 9 this month.
Case selection issues Exclude new samples if longitudinal analysis requires data back to wave one Can be included in cross-sectional analysis for recent waves Can be included in recent wave longitudinal analysis But selection probabilities are different - so inclusion is likely to require reweighting (partly analogous to new entrants to first sample).
Weighting issues - general Weighting is intended to adjust for situations where the analysis sample is not a random sample of the population of inference. Two types of departure: –Unequal selection probabilities –Missing data not completely at random In complex panel study many possible populations of inference, e.g.: –GB population of 1991, surviving to 1998 (for longitudinal analysis) –GB population of 1998 (for cross-sectional analysis)
Weighting BHPS In BHPS we have longitudinal weights (wLEWGHT, wLRWGHT) for the former, and cross-sectional weights (wXEWGHT, wXRWGHT for the latter). Cross-sectional weights incorporate new-entrants, and adjust for their unequal selection probability. (New entrants can be identified through the variable wSAMPST). Use longitudinal weight from last year of sequence. Longitudinal weights currently exclude cases who were missing in an intermediate year.
When to use BHPS weights Always for descriptive population estimates (e.g. proportion in poverty, proportion moving from poverty to non-poverty). Weighted cross-sectional analyses should include new entrant sample members –weights are adjusted to take account of their presence. There is an argument about regression etc. models. The survey statistics perspective is that the only cost of weighting is a (modest) increase in standard errors. The gain is some protection against model miss-specification.
Weights for the new samples Standard weights continue to exclude these cases New cross-sectional weights from wave seven include ECHP cases (wXEWGHTE wXRWGHTE) Wave nine includes two sets of cross-sectional weights for Scotland and Wales cases –One set permits analysis of these samples on their own –Second set permits UK analysis incorporating these samples. Wave Ten release will include longitudinal weights for the new samples.
Family and Household Linkages Collection of longitudinal data about linked individuals within households, and following individuals as they move is a key advantage of BHPS Leads to research on: –interaction between household members of behaviour, decision making and outcomes over time (applications in political research, labour market participation, parental impacts on children, migration) –household and family formation and dissolution processes and their causes and consequences
Technical issues in household linkage - cross-sectional The household grid (wINDALL) contains both relationship to household reference person, and person number of spouse, mother, father The file wEGOALT provides information about the relationship of all individuals in a household to each other. Many other person identifiers relative to the subject can be found at many points in the data - (e.g. wAIDHUA - person number of person cared for within household). These identifiers can all be used to match data between individuals at a single wave.
Technical issues in household linkage - longitudinal Where there is no household composition change, longitudinal matching is straightforward Easiest to start analysis of household composition change using wEGOALT data. Two cases for each pair of persons in a household - (exchanging ego and alter): –Variable wLWSTAT indicates whether alter was in same household as ego last wave –Variable wNWSTAT indicates whether alter in the same household as ego next wave So e.g. wREL=1 (married), and wNWSTAT=2 (different household) indicates a marriage separation Aggregating across cases allows computation of measures of overall household change