Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dynamic Populations in Sample Surveys concerning Business Statistics Peter Tibert Stoltze Statistical Methodology Survey Sampling and Estimation December.

Similar presentations


Presentation on theme: "Dynamic Populations in Sample Surveys concerning Business Statistics Peter Tibert Stoltze Statistical Methodology Survey Sampling and Estimation December."— Presentation transcript:

1 Dynamic Populations in Sample Surveys concerning Business Statistics Peter Tibert Stoltze Statistical Methodology Survey Sampling and Estimation December 2, 2011

2 Contents The population as foundation of the survey Smoothing of sampling fractions over time Conditional sampling Sampling frame and target population What does a stratum shift mean and how to correctly reflect this at the estimation stage? Survey Sampling and Estimation

3 Phases of a Sample Surveys 1.Delimitation of population 2.Stratification and allocation 3.Selection of sample 4.Data collection 5.Data editing and imputation 6.Estimation Survey Sampling and Estimation

4 Phases of a Sample Surveys 1.Delimitation of population 2.Stratification and allocation 3.Selection of sample 4.Data collection 5.Data editing and imputation 6.Estimation Survey Sampling and Estimation

5 Foundation of the Statistics Definition of the population is essential in relation to interpretation of the statistics If we do not have a firm grip on the population, everything else is unimportant! Survey Sampling and Estimation

6 Definition of the Population Population of interest is the collection of objects in which we are interested – Example: All businesses in Denmark Target population is the section of the population of interest that we, for practical reasons, must confine ourselves to observe – Example: All businesses with at least 10 employees Sampling frame is the data representation of the target population available to us – it is from here that the sample is drawn – Example: Extracts from the Central Business Register Survey Sampling and Estimation

7 Sample Survey Sampling and Estimation Population of interest Target population Sampling frame

8 Frame Imperfections The difference between the target population and the sampling frame is due to the fact, that our registers are not perfect Over-coverage: Businesses which are included in the sampling frame, but ought not to be included – Can be discovered during data collection – Example: The business went bankrupt long before the starting date of the reference period Under-coverage: Businesses which ought to be included in the sample frame, but are not included – Can be discovered, if we have knowledge of the area via other sources – Example: The Lego Group is not included Survey Sampling and Estimation

9 Time of Reference Stock populations describe the businesses which exist at a given time – Example: Survey of pigs Flow populations describe the businesses which have existed over a given period of time – Example: Index of Retail Sales Survey Sampling and Estimation Beginning of 2010 End of 2010

10 Decisions about Unit Level Statistical units – Enterprise (ok_no) – Kind-of-activity units (fag_no) – Local kind-of-activity units (arb_no) Groups of companies and complex economic units – Aggregation of variables of interest – Selection of the fundamental unit Survey Sampling and Estimation

11 Dynamics of the Population Important to compare population for survey t with population for survey t-1 in order to interpret the development in the estimated figures Can be done by comparing distributions of register- based information by classifications – Number of units, number of employees and total sum of turnover – Distributed by activity and size groups as well as by region Conceived into the production, so that it is not necessary to conduct ”data-archaeology” (it is not very often that there is time for this!) Survey Sampling and Estimation

12 Phases of a Sample Surveys 1.Delimitation of population 2.Stratification and allocation 3.Selection of sample 4.Data collection 5.Data editing and imputation 6.Estimation Survey Sampling and Estimation

13 Stratification There exist algorithms, which can determine optimum thresholds, when stratification is conducted in accordance with a continuous variable (e.g. number of employees) – Sampling fraction f = 0 (take none) – Sampling fraction 0 < f < 1 (take some) – Sampling fraction f = 1 (take all) It is not necessary that the stratification is equal to the domains according to which we want to publish statistics – In the case of large surveys, it is frequently worthwhile to slightly refine the stratification – It is recommended as starting point not to let strata and domains cross each other… Survey Sampling and Estimation

14 Allocation Conservative via smoothing of s h over time Survey Sampling and Estimation

15 Phases of a Sample Surveys 1.Delimitation of population 2.Stratification and allocation 3.Selection of sample 4.Data collection 5.Data editing and imputation 6.Estimation Survey Sampling and Estimation

16 Selection of the Sample When population, stratification, and allocation are in place, it is easy to extract the sample by applying proc surveyselect proc surveyselect data=pop method=srs seed=1600 n=sampsize out=samp; strata stratum; id cvrnr; run; However, sometimes there are restrictions or conditions as to the selection, e.g. – if there are units in the sampling frame, which must not be extracted – if we want to control overlaps between samples by introducing panels Survey Sampling and Estimation

17 Certain Units Cannot Be Selected… Sometimes, it is only possible to extract a subset of the framework population, which will then be called the survey population – Example: Businesses exist at the time of reference (during the period of reference), but has ceased to exist at the time of the survey One solution is to remove them from the population from which they are extracted and then have the remaining unit account for them (they are included in N h ) Survey Sampling and Estimation

18 Rotating Panels Controlled overlaps between samples – Respondents gradually learn to fill out the questionnaire correctly (can be interpreted as a reduction of measurement error) – A smaller degree of uncertainty with regard to the change due to positive correlation between replies at t and t-1 – Could be part of the co-ordination of several samples Survey Sampling and Estimation

19 Rotating panels in Research, Development and Innovation Panel2007200820092010201120122013 1 2 3 4 5 6 7 8 9 10 11 Survey Sampling and Estimation

20 Structure of Rotating Panels With a panel divided into 5 parts, selected units participate as starting point for five years, and thereafter they are ”released” This ideally corresponds to an overlap of 80 pct. and a replacement of 20 pct. However, for this to be true neither growth nor shrinkage of the population is allowed, and the allocation has to remain unchanged Survey Sampling and Estimation

21 Principle with Stratified Sampling Allocation n h at stratum level as usual Substrata h i are defined according to the stratum to which the unit belonged last time (quadratic matrix with strong diagonal) or whether the unit is new Allocation n hi at substratum level by means of proportional distribution of n h according to N hi and with stochastic rounding Survey Sampling and Estimation

22 Principle… (continued) The four most recent panels (5-8 in R&D 2010) maintain their panel number, while all others (including the just released panel 4) are assigned panel number 0 Within substrata, the units are sorted in random order – however, the highest panel numbers are at the top The first n hi units are selected for the sample – Sometimes, we do not succeed in including everyone from panels 5-8 (especially, if there are many new units) – Selected units in addition to panels 5-8 make up the new panel 9 (can contain in principle units just released from panel 4) The principle implies that the design weight at stratum level N h /n h represents correctly new units and stratum changes Survey Sampling and Estimation

23 Phases of a Sample Surveys 1.Delimitation of population 2.Stratification and allocation 3.Selection of sample 4.Data collection 5.Data editing and imputation 6.Estimation Survey Sampling and Estimation

24 Data Collection and Data Editing Data collection and data editing result in potential synergistic effects – ”Complaints” put forward by respondents can provide a reliable indication of strengths and weaknesses in the questionnaire A precondition of macro editing is that it must be easy to do current estimation while data collection is still on-going Survey Sampling and Estimation

25 Imputation A certain extent of non-response is unavoidable, but in many strata it is possible to account for this at the estimation stage However, non-response among the largest businesses (from take all-strata) is unacceptable, and extra efforts should be made in order to achieve complete response – Best alternative is imputation Survey Sampling and Estimation

26 Phases of a Sample Surveys 1.Delimitation of population 2.Stratification and allocation 3.Selection of sample 4.Data collection 5.Data editing and imputation 6.Estimation Survey Sampling and Estimation

27 6. Estimation Estimation based on an updated population – Short-term survey (monthly or quarterly) – Structural survey (annual) Design weights and selection probabilities are sacred, and the handling of stratum changes should be conducted by calibration and domain-estimation Estimation may account for cut-off sampling Survey Sampling and Estimation

28 Time tTime t+1Time t+2 Dynamic Frame Population Survey Sampling and Estimation tt+1t+2 Current versionHistoric version

29 Time t Time t+1Time t+2Time t+3Time t+4 Frozen Frame Population Survey Sampling and Estimation tt+1t+2 Current versionHistoric version Frozen version

30 Population at Estimation stage Survey Sampling and Estimation tt+1t+2 Current versionHistoric version Frozen versionSample Estimation of structural survey Estimation of short-term survey

31 Stratum Changes at Unit Level Principle: At the estimation stage we discover, that a unit selected in stratum h a with π = 0.1 has moved to stratum h b We then have to believe that 9 other (unobserved) units from h a have made a similar move Instead of changing the selection probabilities, the new strata are regarded as domains, and calibration is conducted on the basis of the new stratum sizes Survey Sampling and Estimation

32 Estimation in the Case of Cut-Off Sampling Cut-off is conducted in the step where the target population is determined as a subset of the population of interest – Example: Leaving out businesses with less than 10 employees Estimation to account for the cut-off can be conducted as mass imputation by means of modelling, but this may mean an unreasonable extrapolation of the primary data Auxiliary variables (number of employees or turnover) can be applied, but the result is highly sensitive – Example: An erroneously profiled business with 0 employees and a turnover of DKK 2.0 bn. Survey Sampling and Estimation


Download ppt "Dynamic Populations in Sample Surveys concerning Business Statistics Peter Tibert Stoltze Statistical Methodology Survey Sampling and Estimation December."

Similar presentations


Ads by Google