Presentation on theme: "Unido.org/statistics International workshop on industrial statistics 8 – 10 July, Beijing Non response in industrial surveys Shyam Upadhyaya."— Presentation transcript:
unido.org/statistics International workshop on industrial statistics 8 – 10 July, Beijing Non response in industrial surveys Shyam Upadhyaya
unido.org/statistics What is non-response? Failure of obtaining data from some units of the target population of a survey Unlike the survey of human population, which is relatively homogeneous, non-response may create serious problems in industrial survey Larger establishments account for higher share in estimates of total value as well as the variance of key variables A certain number of non-response is always expected. Thus, a plan for non-response treatment should be thought in priori.
unido.org/statistics How does the non-response affect – conceptual framework
unido.org/statistics Response rate In the frameNot in the frame Total Within scope A1A1 A2A2 In scope = A 1 + A 2 Outside the scope B1B1 B2B2 Outside the scope = B 1 + B 2 Total In the frame = A 1 + B 1 Missing units = A 2 + B 2 Response rate is the ratio of statistical units actually observed with respect to the number of eligible units for the survey. This ratio may not be found when the frame is imperfect
unido.org/statistics The existing frame should be updated with the additional information data from listing operation or administrative sources.
unido.org/statistics Measurement of response rates Unit response rate Particularly important for monitoring the progress of survey Weighted response rate Share of respondents in total value of a key variable of the survey (in case of sample survey w means design weights) For survey estimates, WRR carries more value as it reflects the actual coverage, thus representativeness of the survey
unido.org/statistics Variation of URR and WRR by sub-population URR and WRR are rarely equal due to the variation of size of establishments. If better response is achieved from larger establishments WRR is higher.
unido.org/statistics Types of non-response 1.Unit non-response, when there was no response from some statistical units 2.Item non-response – when some statistical units provided incomplete data (data missing for some variables within the unit) 3.Wave non-response – it may occur in panel surveys, when some statistical units respond in one round but do not respond in another.
unido.org/statistics How to handle non-response? Treatment of non-response depends on the type of non-response as well as the type of survey Unit non-response Item non- response Sample survey Weight adjustment to reflect the reduction of sample size Imputation Census No-internal solution External sources such as admin data or past survey data Imputation
unido.org/statistics Unit non-response In sample survey Weight adjustment: design weight estimation weight Non-response in sample survey is considered as reduction of the sample size. Subsequently design weight is inflated, assuming that non-response has occurred at random. In census: There is no weights to adjust. Other ways to compensate unit non-response are : administrative data or earlier survey data adjusted with applicable growth rates
unido.org/statistics Imputation for non-response Imputation is a technique of finding some artificial values to replace missing data due to non-response Basis consideration of replacement is that imputation is done from the observed value of a statistical unit that is quite similar to the non-respondent Imputation is particularly effective for item non-response. Many variables of industrial survey are highly correlated; therefore mean and ratio of observed units may serve as predictor for non-respondents
unido.org/statistics Some imputation methods Imputation based on mean value Missing data is estimated by the mean value of observed units Effective for homogeneous statistical units, for example within a size class of industry group at 4-digit level of ISIC Hot deck imputation Missing data are replaced by the value of observed units. For this purpose a pool of donors created. Under the random hot-deck method donors are selected at random Alternatively, a donor can be the nearest neighbour. This method is called deterministic hot-deck method.
unido.org/statistics Est_ID Number of employees Sale Distance [Abs] Replacing value 4781989144560 4782895147675109 4783786…128589 478477112858915 4785653101868 478655484762 478732168150 4788205301357 4789198…30135 47901062594692 Example: Imputation with nearest neighbour method
unido.org/statistics Imputation methods…cont Cold deck methods As opposed to hot-deck (the term refers to punch cards) cold deck method is based on past data Post stratification Statistical units are further stratified to create homogenous groups from which mean, median or ratio are computed to replace the missing value Statistics modelling Regression or similar models are constructed where the regression coefficients (or parameters) may serve as predictor of the missing value
unido.org/statistics Imputation for unit non-response using external data sources Administrative data In case of unit non-response, there would not be any information from the survey. Alternatively, data for some key variables might be obtained from administrative sources. Data from the previous survey Often termed as Carry forward replacing the major values by results from earlier survey – effective for quarterly/monthly surveys For annual and surveys with longer interval growth adjustment is necessary
unido.org/statistics Some other points on non response Imputation does not necessarily reduce the bias, in sample survey it may even increase the standard error Unlike the household survey where ratio and mean estimates are important, industrial survey results are supposed to produce the total measure – such as industrial output, employment Imputation for missing data helps to improve the coverage of the survey estimates Imputation for large database requires carefully developed software application