Download presentation
Presentation is loading. Please wait.
Published byConcetta Stefani Modified over 6 years ago
1
Survey Quality Measurement» Rome, 24-27 September 2013
Measurement and Processing Errors Training Course «Quality Management and Survey Quality Measurement» Rome, September 2013 Marcello D’Orazio, Giovanna Brancato Istat {madorazi,
2
Outline Measurement and processing errors Response errors Causes Impact of response errors on final survey estimates Evaluating the impact of response errors Interviewer effect Data treatment and processing errors
3
Measurement and Processing Errors
Deviations between the value of a variable observed for surveyed unit true value of that variable on that unit Response errors when happen in the data collection phase Data treatment and processing errors (measurement errors in a broader sense) when happen in the data treatment phase (after the data collection). The processing errors include also errors in procedures to derive the final survey estimates (totals, averages, proportions, tables, etc.) 3
4
Causes of Response Errors
questionnaire (wording, length, instructions, coding system, …) data collection mode (self-administered or not; paper vs. CASI, …) Interviewer (contact, administration of the questions, probing of answers, recording answers, cheating interviews, …) Respondent (understanding questions, remembering events, willingness to cooperate, …) These sources are in general addressed separately, while they can also interact 4
5
Effect of Response Errors on Final Estimates
When errors are not detected and corrected they may: Introduce bias into the final estimates, because of systematic pattern or direction in the differences between the collected data and the true data Determine an increase of the variance associated to the estimates because of: Random variation in the collected data (e.g. respondent provides answer at random) Association/correlation among data collected on different units (e.g. the same interviewer influence in the same manner the answers provided by different units; interviewer effect) 5
6
Effect of Response Errors on Estimates: Additive Model
U is the target population consisting of N units Y is the continuous variables being surveyed When a unit k is observed, instead of the true value we observe : true value plus two terms: • A systematic term of error: • A random term of error: 6
7
Effect of Response Errors on Estimates: Additive Model (cont.ed)
In the hypothetical case of carrying out many measurements on the same unit, independently and in the same conditions: by averaging out the measurements on the same units: the observed values, ,on the same units are scattered randomly around and their variability is: moreover, the values observed of two different units in the same occasion may be related, and the covariance is: 7
8
Effect of Response Errors on Estimates: Additive Model (cont.ed)
8
9
Effect of Response Errors on Estimates: MSE
Objective: estimate the population average a simple random sample of n units out of N it is considered the sampling frame is free of errors (no coverage errors) All the units respond (no item or unit nonresponse) just sampling and measurement errors are considered If the sample mean is applied to the collected values: Then it comes out that the MSE of this estimator is obtained as: 9
10
Effect of Response Errors on Estimates: MSE Components
is the simple component of the variance due to random measurement errors in observing a number of times the same unit is the correlated component of the variance due to measurement errors, caused by correlation between values observed on different units is the sampling variance is the response bias due to systematic errors in observing a number of times the same unit 10
11
Effect of Response Errors on Estimates: Response Bias
The response bias is the average of the systematic measurement errors over all the units in the population. It disappears when there are no systematic errors: for all the units in the population 11
12
Effect of response Errors on Estimates: Simple Response Var.
The simple component of the variance due to measurement errors: is the average of the response variances for all the units divided by the sample size n In theory it should decrease by increasing the sample size in practice, tend to increase with increasing sample size 12
13
Effect of Response Errors on Estimates: Correlated Resp. Var.
The correlated component of the variance due to measurement errors: Does not depend on the sample size n Disappears if there no relationship between responses provided by different units in the same survey occasion: for each possible couple of units in the population (e.g. surveys with self administered forms) Absence of such a relationship is unrealistic in surveys with interviewers 13
14
Effect of Response Errors on Estimates: Response Variance
If the relationship between responses on different units is expressed in relative terms, i.e. correlation: It comes out: 14
15
Effect of Response Errors on Estimates: Response Variance
Small values of can determine a relevant increase of the overall response variance: 15
16
Preventing Response Errors
17
Correcting Measurement Errors
During data collection: additional training for some interviewer whose performances are not satisfactory Call-backs of a sample of respondents to ascertain for possible interview cheating Corrections of deficiencies in tools, procedures, etc. After data collection: revision of the collected data Check of the data using procedures of editing and imputation Note: editing and imputation can introduce new measurement errors
18
Randomized experiments
Assessing Impact of Measurement Errors on Final Survey Estimates Record check studies Reinterview surveys Randomized experiments 18
19
Record check studies Comparison of the collected data with similar data (same variables) available in external data sources (other surveys, administrative registers, sampling frames, etc.) Reverse record check: a variable already available in the sampling frame is observed in the survey too Forward record check: survey data are linked with additional external data sources Linkage is simple if in both the data sources the units come with an (error free) identification code. On the contrary it should be evaluated the risk of linking and data not referred to the same unit 19
20
Different objectives: Estimate the response bias
Reinterview Studies Usually consists in additional surveys: data collection is repeated on a subsample of the units observed in the main survey Are expensive and therefore are carried out after very important large scale surveys (e.g. Census) Different objectives: Estimate the response bias Estimate the response variance Estimate both response bias and variance 20
21
Types of Reinterview Studies
Repetition of interview: Test-retest: reinterview carried out independently and in the same conditions --> permits to estimate the response variance Repeated measures: reinterview is carried out in a different mode/conditions --> may allow estimation of bias and response variance (introduction of complex models) Reinterview with the gold standard: during reinterview, data are compared with the ones collected in the main surveys; in case of discrepancies the respondent is asked for the “true” response --> permits to estimate the response bias 21
22
Types of Reinterview Studies (cont.ed)
Mix of test-retest + reinterview with gold standard The subsample for the reinterview survey is split randomly in two groups: test-retest is carried out on a group Reinterview with gold standard is considered for the other group This mixing permits the estimation of both response bias and response variance. 22
23
Test-retest reinterview
It is a survey carried out on a (random) subsample of the units who responded to the main survey. The reinterview is carried out in the same conditions of the main survey After a suitable time interval, to avoid the respondents to remember responses provided in the first survey occasion The objective is to estimate the simple response variance, and it is assumed the absence of the correlated component of the response variance (e.g. there are no interviewers). 23
24
Test-retest reinterview: continuous variables
For a given unit involved in the reinterview study, two independent measurements are available: Note: the values being considered are the responses collected before the data treatment phase. The two measurements allow for the estimation of the measurement variance at unit level: 24
25
Test-retest reinterview: continuous variables
When the subsample for the reinterview is selected by means of SRS, then an estimate of the simple response variance is obtained by substituting with the corresponding estimate: is the gross difference rate (GDR) 25
26
Test-retest reinterview: categorical variables
In the simple case of dichotomous variables the following tables can be derived: In such a case it can be shown that: The smaller is g the higher it will be the concordance between the responses 26
27
Test-retest reinterview: Index of Inconsistency
The size of the simple response variance can be expressed in relative terms by comparing it with the overall variance: I represents the index of inconsistency ( ) Rule of thumb by the US Bureau of the Census: low response variance --> high reliability of responses moderate response variance high response variance --> low reliability 27
28
Example of Test-Retest reinterview
Coverage and quality control survey for 2001 Population and Housing Census in Italy Two-fold objective: estimate undercounts and response variance Random sample of about 1,100 Census Enumeration Areas (EA). Reinterview to all the people in the sample EAs (about 172,000 people): 15 questions asked again (mainly categorical variables) 28
29
Reliability of responses concerning the variable Gender
Example: the Italian Coverage and Quality Control Survey for 2001 Population and Housing Census Reliability of responses concerning the variable Gender 29
30
Example: the Italian Coverage and Quality Control Survey for 2001 Population and Housing Census
Source: D’Orazio (2010) 30
31
Repeated Measures If reinterview is carried out in different conditions wrt to main survey, tools and methods of test-retest reinterview can not be used; they would provide biased results In such cases it is necessary to use ad hoc methods. Latent Class Models (LCM) permit to manage couples of measurements obtained under different conditions. They can provide an estimate of both bias and response variance. Their usage usually is based on very strong assumptions, and therefore results will be unreliable if the assumptions are wrong 31
32
For further details see D’Orazio (2010)
Example: application of LC models to 2001 Census Quality Control Survey For further details see D’Orazio (2010) 32
33
Reinterview with the gold standard
Reinterview is carried out in the same conditions of the main survey but: If the responses provided in the two survey occasion are different then the respondent is asked to provide the true response On the contrary (equal responses ) no reconciliation is performed and the unique value it is considered the true one If is the true value, then an estimate of the response bias is: Valid when units in the control survey are selected according to SRS 33
34
Randomized Experiments: the interviewer effect
When measurement errors are caused by interviewers: can be in the same direction (interviewer bias) but the direction and the size of the error can change from interviewer to interviewer can be random, but the variability changes depending on the interviewer Kish’s (1962) additive model: k denoted the unit, while j denotes the interviewer : response bias, just due to the interviewer j : random measurement error due to the respondent and the interviewer 34
35
Randomized Experiments: the interviewer effect (cont.)
Let consider: A fixed pool of J interviewers The sample of units to be observed is split randomly in J subsamples of equal size Each subsample is randomly assigned to an interviewer Under these conditions it is possible to express the response variance as: Where measures the correlation between responses provided by different units interviewed by the same interviewer 35
36
In this expression the term:
Randomized Experiments: the interviewer effect (cont.) In this expression the term: Denotes the increase of the response variance due to the interviewer and therefore is called interviewer effect It disappear if or 36
37
Randomized Experiments: the interviewer effect (cont.)
Small values of can determine a not negligible increase of the response variance: 37
38
Estimating the interviewer effect: interpenetrating samples
This technique permits to estimate the response variance without carrying reinterview studies 38
39
Estimating the interviewer effect: interpenetrating samples
The interpenetrating technique is relatively simple to implement in telephone surveys. In face to face interview this can produce an uncontrolled increase of the data collection expenses, unless a geographic interpenetration it is considered 39
40
Estimating the interviewer effect: example of geographic interpenetration
40
41
The models being considered are the multilevel ones.
Estimating the interviewer effect: interpenetrating samples The interviewer effect can be estimated applying complex models accounting for the hierarchical structure of the data: respondents are “nested” in the interviewers. The models being considered are the multilevel ones. Such models can account for interviewer characteristics too. They are complex and their applicability relays on some strong assumptions. 41
42
Data Treatment and Processing Errors
Different nature: from simple recording errors (transcribing or transmission errors) to more complex errors deriving from a misspecification of an edit or imputation model. Two broad classes: Systems error: deriving by incorrect specification or implementation of systems needed to carry out surveys and process results Data handling errors: operations used to capture and clean the data Coding (pre-editing) Data entry and key entry Editing and Imputation 42
43
Data Treatment Errors: Coding
Assignment of codes to responses (open-ended responses, codes for: geographical areas, economical activity, education, occupation) - errors in the structure (non unique codes used for different categories) - errors of keying - errors in the interpretation of the descriptions (similar to interviewer effect if coding is performed by coders) 43
44
Some examples of typical errors:
Data Treatment Errors: Data Entry and Key Entry Errors Variable according to the types of information collected (numeric vs. character) and the mode of data collection Some examples of typical errors: - amount entered in the wrong measurement unit - key entry errors relative to contiguous keys on the keyboard - exchange of keys - skipping a value 44
45
Data Treatment Errors: Data Entry and Key Entry Errors
Impact of errors: Introduced by clerical work --> risk of random and systematic measurement errors Introduced by the software --> risk of systematic errors Assessing the impact: Testing of procedures Repetition of some operations (double coding, …) Ad hoc studies Randomized experiments 45
46
Selected references Biemer P.P. (2004), “The Twelfth Morris Hansen Lecture. Simple response Variance: Then and Now”, Journal of Official Statistics, 3, pp Biemer, P., Forsman, G. (1992) “On the quality of reinterview data with application to the current population survey”, Journal of the American Statistical Association, N. 87, pp Biemer P.P., Lyberg L.E. (2003) Introduction to survey quality. Wiley, New York. Brancato G, Fortini M, Pichiorri T.(2001) On the Bayesian analysis to estimate response error in National Statistical Institutes. International Conference on Quality, Stockholm, May 2001. Brancato G., D’Orazio M., Fortini M. (2004), “Response Error Estimation in Presence of Record Linkage Errors: The Case of the Italian Population Census”. Proceeding of the European Conference on Quality and Methodology in Official Statistics (Q2004), Mainz, Germany, May 2004. Cochran, W. G. (1977) Sampling Techniques, 3rd ed., Wiley, New York. D’Orazio M. (2010), “Evaluating reliability of combined responses through latent class models”. Rivista di Statisticas Ufficiale, 1/2010.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.