Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Overview of the Sample Survey Process in Business Statistics Peter Tibert Stoltze Statistical Methodology Survey Sampling and Estimation November 2014Survey.

Similar presentations


Presentation on theme: "An Overview of the Sample Survey Process in Business Statistics Peter Tibert Stoltze Statistical Methodology Survey Sampling and Estimation November 2014Survey."— Presentation transcript:

1 An Overview of the Sample Survey Process in Business Statistics Peter Tibert Stoltze Statistical Methodology Survey Sampling and Estimation November 2014Survey Sampling and Estimation1

2 Major Phases of a Sample Survey 1.Overall planning of the survey 2.Constructing the sampling frame 3.Choice of sampling design 4.Selection of sample 5.Data collection 6.Data editing and imputation 7.Estimation Survey Sampling and EstimationNovember 20142

3 Major Phases of a Sample Survey 1.Overall planning of the survey 2.Constructing the sampling frame 3.Choice of sampling design 4.Selection of sample 5.Data collection 6.Data editing and imputation 7.Estimation Survey Sampling and EstimationNovember 20143

4 Overall planning of survey What do we want to find out? – Which population and which attributes? What do we already know? – Is there any register information available or related survey (auxiliary information)? What is the demand to measures of precision? – What sample size is needed for a given spread in the population? Planning is based on a number of assumptions, which should be very explicit. Survey Sampling and EstimationNovember 20144

5 Major Phases of a Sample Survey 1.Overall planning of the survey 2.Constructing the sampling frame 3.Choice of sampling design 4.Selection of sample 5.Data collection 6.Data editing and imputation 7.Estimation Survey Sampling and EstimationNovember 20145

6 Foundation of the Statistics Definition of the population is essential in relation to interpretation of the statistics If we do not have a firm grip on the population, everything else is unimportant! Survey Sampling and EstimationNovember 20146

7 Definition of the Population Population of interest is the collection of objects in which we are interested – Example: All businesses in Denmark Target population is the section of the population of interest that we, for practical reasons, must confine ourselves to observe – Example: All businesses with at least 10 employees Sampling frame is the data representation of the target population available to us – it is from here that the sample is drawn – Example: Extracts from the Central Business Register Survey Sampling and EstimationNovember 20147

8 Sample Survey Sampling and Estimation Population of interest Target population Sampling frame November 20148

9 Frame Imperfections The difference between the target population and the sampling frame is due to the fact, that our registers are not perfect Over-coverage: Businesses which are included in the sampling frame, but ought not to be included – Can be discovered during data collection – Example: The business went bankrupt long before the starting date of the reference period Under-coverage: Businesses which ought to be included in the sample frame, but are not included – Can be discovered, if we have knowledge of the area via other sources – Example: The Lego Group is not included Survey Sampling and EstimationNovember 20149

10 Time of Reference Stock populations describe the businesses which exist at a given time – Example: Survey of pigs Flow populations describe the businesses which have existed over a given period of time – Example: Index of Retail Sales Survey Sampling and Estimation Beginning of 2010 End of 2010 November 201410

11 Decisions about Unit Level Statistical units – Enterprise (ok_no) – Kind-of-activity units (fag_no) – Local kind-of-activity units (arb_no) Groups of companies and complex economic units – Aggregation of variables of interest – Selection of the fundamental unit Survey Sampling and EstimationNovember 201411

12 Dynamics of the Population Important to compare population for survey t with population for survey t-1 in order to interpret the development in the estimated figures Can be done by comparing distributions of register- based information by classifications – Number of units, number of employees and total sum of turnover – Distributed by activity and size groups as well as by region Conceived into the production, so that it is not necessary to conduct ”data-archaeology” (it is not very often that there is time for this!) Survey Sampling and EstimationNovember 201412

13 Major Phases of a Sample Survey 1.Overall planning of the survey 2.Constructing the sampling frame 3.Choice of sampling design 4.Selection of sample 5.Data collection 6.Data editing and imputation 7.Estimation Survey Sampling and EstimationNovember 201413

14 Sampling design Due to rich sampling frames and mode of collection (web questionnaire) the most used design is stratified simple random sampling (STSI) If we did not have a rich sampling frame, or if the mode of collection was by personal interview, then we probably would use cluster sampling… Survey Sampling and EstimationNovember 201414

15 Stratification There exist algorithms, which can determine optimum thresholds, when stratification is conducted in accordance with a continuous variable (e.g. number of employees) – Sampling fraction f = 0 (take none) – Sampling fraction 0 < f < 1 (take some) – Sampling fraction f = 1 (take all) It is not necessary that the stratification is equal to the domains according to which we want to publish statistics – In the case of large surveys, it is frequently worthwhile to slightly refine the stratification – It is recommended as starting point not to let strata and domains cross each other… Survey Sampling and EstimationNovember 201415

16 Allocation Survey Sampling and EstimationNovember 201416

17 Allocation Conservative via smoothing of s h over time Survey Sampling and EstimationNovember 201417

18 Major Phases of a Sample Survey 1.Overall planning of the survey 2.Constructing the sampling frame 3.Choice of sampling design 4.Selection of sample 5.Data collection 6.Data editing and imputation 7.Estimation Survey Sampling and EstimationNovember 201418

19 Selection of the Sample When population, stratification, and allocation are in place, it is easy to extract the sample by applying proc surveyselect proc surveyselect data=pop method=srs seed=1600 n=sampsize out=samp; strata stratum; id cvrnr; run; However, sometimes there are restrictions or conditions as to the selection, e.g. – if there are units in the sampling frame, which must not be extracted – if we want to control overlaps between samples by introducing panels Survey Sampling and EstimationNovember 201419

20 Certain Units Cannot Be Selected… Sometimes, it is only possible to extract a subset of the framework population, which will then be called the survey population – Example: Businesses exist at the time of reference (during the period of reference), but has ceased to exist at the time of the survey Must be removed from the sampling frame but still make the remaining unit account for them (they are included in N h ) Survey Sampling and EstimationNovember 201420

21 Rotating Panels Controlled overlaps between samples – Respondents gradually learn to fill out the questionnaire correctly (can be interpreted as a reduction of measurement error) – A smaller degree of uncertainty with regard to the change due to positive correlation between replies at t and t-1 – Could be part of the co-ordination of several samples Survey Sampling and EstimationNovember 201421

22 Rotating panels in Research, Development and Innovation Panel2007200820092010201120122013 1 2 3 4 5 6 7 8 9 10 11 Survey Sampling and EstimationNovember 201422

23 Structure of Rotating Panels With a panel divided into 5 parts, selected units participate as starting point for five years, and thereafter they are ”released” This ideally corresponds to an overlap of 80 pct. and a replacement of 20 pct. However, for this to be true neither growth nor shrinkage of the population is allowed, and the allocation has to remain unchanged Survey Sampling and EstimationNovember 201423

24 Principle with Stratified Sampling Allocation n h at stratum level as usual Substrata h i are defined according to the stratum to which the unit belonged last time (quadratic matrix with strong diagonal) or whether the unit is new Allocation n hi at substratum level by means of proportional distribution of n h according to N hi and with stochastic rounding Survey Sampling and EstimationNovember 201424

25 Principle… (continued) The four most recent panels (5-8 in R&D 2010) maintain their panel number, while all others (including the just released panel 4) are assigned panel number 0 Within substrata, the units are sorted in random order – however, the highest panel numbers are at the top The first n hi units are selected for the sample – Sometimes, we do not succeed in including everyone from panels 5-8 (especially, if there are many new units) – Selected units in addition to panels 5-8 make up the new panel 9 (can contain in principle units just released from panel 4) The principle implies that the design weight at stratum level N h /n h represents correctly new units and stratum changes Survey Sampling and EstimationNovember 201425

26 Major Phases of a Sample Survey 1.Overall planning of the survey 2.Constructing the sampling frame 3.Choice of sampling design 4.Selection of sample 5.Data collection 6.Data editing and imputation 7.Estimation Survey Sampling and EstimationNovember 201426

27 Data Collection and Data Editing Data collection and data editing result in potential synergistic effects – ”Complaints” put forward by respondents can provide a reliable indication of strengths and weaknesses in the questionnaire A precondition of macro editing is that it must be easy to do current estimation while data collection is still on-going Survey Sampling and EstimationNovember 201427

28 Imputation A certain extent of non-response is unavoidable, but in many strata it is possible to account for this at the estimation stage However, non-response among the largest businesses (from take all-strata) is unacceptable, and extra efforts should be made in order to achieve complete response – Best alternative is imputation Survey Sampling and EstimationNovember 201428

29 Major Phases of a Sample Survey 1.Overall planning of the survey 2.Constructing the sampling frame 3.Choice of sampling design 4.Selection of sample 5.Data collection 6.Data editing and imputation 7.Estimation Survey Sampling and EstimationNovember 201429

30 Estimation Estimation based on an updated population – Short-term survey (monthly or quarterly) – Structural survey (annual) Design weights and selection probabilities are sacred, and the handling of stratum changes should be conducted by calibration and domain-estimation Estimation may account for cut-off sampling Survey Sampling and EstimationNovember 201430

31 Time tTime t+1Time t+2 Dynamic Frame Population Survey Sampling and Estimation tt+1t+2 Current versionHistoric version November 201431

32 Time t Time t+1Time t+2Time t+3Time t+4 Frozen Frame Population Survey Sampling and Estimation tt+1t+2 Current versionHistoric version Frozen version November 201432

33 Population at Estimation stage Survey Sampling and Estimation tt+1t+2 Current versionHistoric version Frozen versionSample Estimation of structural survey Estimation of short-term survey November 201433

34 Stratum Changes at Unit Level Principle: At the estimation stage we discover, that a unit selected in stratum h a with π = 0.1 has moved to stratum h b We then have to believe that 9 other (unobserved) units from h a have made a similar move Instead of changing the selection probabilities, the new strata are regarded as domains, and calibration is conducted on the basis of the new stratum sizes Survey Sampling and EstimationNovember 201434

35 Estimation in the Case of Cut-Off Sampling Cut-off is conducted in the step where the target population is determined as a subset of the population of interest – Example: Leaving out businesses with less than 10 employees Estimation to account for the cut-off can be conducted as mass imputation by means of modelling, but this may mean an unreasonable extrapolation of the primary data Auxiliary variables (number of employees or turnover) can be applied, but the result is highly sensitive – Example: An erroneously profiled business with 0 employees and a turnover of DKK 2.0 bn. Survey Sampling and EstimationNovember 201435

36 Calibration for non-response Done using SAS and CLAN macros (available from Statistics Sweden) A free alternative is the survey package for R (demonstration) Survey Sampling and EstimationNovember 201436


Download ppt "An Overview of the Sample Survey Process in Business Statistics Peter Tibert Stoltze Statistical Methodology Survey Sampling and Estimation November 2014Survey."

Similar presentations


Ads by Google