Presentation is loading. Please wait.

Presentation is loading. Please wait.

FDI - Imputation. Overview Introduction Overview of Imputation Methods Overview of Outliering methods Overview of Estimation methods Aggregation Disclosure.

Similar presentations


Presentation on theme: "FDI - Imputation. Overview Introduction Overview of Imputation Methods Overview of Outliering methods Overview of Estimation methods Aggregation Disclosure."— Presentation transcript:

1 FDI - Imputation

2 Overview Introduction Overview of Imputation Methods Overview of Outliering methods Overview of Estimation methods Aggregation Disclosure Quality information

3 Results process ValidationAnalysisImputationOutlieringEstimationAggregationDisclosureOutputs

4 Methodology review of methods for FDI Methods reviewed back in 2011 as part of ESA10 Changes in international regulations require changes to be made to the FDI questionnaire, plus the survey data take-on and processing system. Opportunity to harmonise the Annual and Quarterly FDI methods and improve data quality

5 What is imputation? Imputation is defined as “A procedure for entering a value for a specific data item where the response is missing or unusable”. (UNECE Glossary of Terms) In practice, imputation is a way to estimate for a non-responder or for an unusable response. For example, unusable due to errors or inconsistent responses.

6 6 There are two types of non-response: complete and partial. These are known respectively as unit non- response and item non-response. Unit non-response occurs when –a respondent answers no survey questions Item non-response occurs when –a respondent answers some but not all survey questions Types of non-response

7 7 Ideally, non-response should be avoided completely, by solving the issues which cause it: –negative attitude towards ONS –problems contacting the ONS –problems with questionnaire design –problems with timing, burden, sensitivity etc However, the reality is that non-response always occurs in sample surveys. Avoiding non-response

8 8 Once it has occurred, non-response can be dealt with by the following: Do nothing re-contact imputation or more subjectively, manual construction Dealing with non-response

9 When CORA uses the different imputation methods MethodWhen appliedQuestions applied to Question descriptions Ratio of means imputation Annual and Qtr 1011, 1012, 1111, 1112, 1211,1212, 1311, 1312, 1321, 1322, 3412, 3422, 3712, 3722 Profit /loss, tax credits, closing balances Default to zero Annual and Qtr 2039, 2111, 2112, 2121, 2122, 2211, 2212, 2221, 2222, 2611, 2612, 2621, 2622 Exceptional dividends, acquisitions and disposals and increase and decreases in equity Copy forward previous period Annual and Qtr 3191 (impute prev 3192) 3291 (impute prev 3292) 3411 (impute prev 3412) 3421 (impute prev 3422) 3691 (impute prev 3692) 3711 (impute prev 3712) 3721 (impute prev 3722) Opening balances Impute a median value Qtr 2019 Ordinary dividend

10 Ratio of Means This is the main imputation method Used for profit /loss, tax credits, closing balances questions The next few slides will walk you through how the Ratio of Means is calculated. It is important to note that the calculations will all be done in CORA.

11 How Ratio of Means is Calculated For each question (relevant for Ratio of Means) group question data by company type i.e. branch or subsidiary and by industry. Sum the question response for each company within the group for the current period Sum the question response for each company within the group for the previous period Current period question total Previous period question total = question ratio

12 Ratio of Means Example Current period Previous period 45 / 62 = 0.73 (ratio) Company ACountryIndustryValue 1US6922 2US61010 3FR59050 Company BFR 62040 Company ACountryIndustryValue 1US6926 2US6104 3 FR590 35 Company BFR 620? 62 45

13 Ratio of Means Example Current period Previous period Company ACountryIndustryValue 1US6922 2US61010 3FR59050 Company BFR 62040 Company ACountryIndustryValue 1US6926 2US6104 3 FR590 35 Company BFR 620? Ratio= 0.73 40 x 0.73 29.2

14 Application of Ratio of Means So the above slides creates the Ratio, but how is this applied? Where the company responded in the last period: Previous response was a positive number - multiply the response by the ratio to create an imputed value. Previous response was a negative number – current period value set to 0.

15 Application of Ratio of Means Where the company did not respond in the last period: Ratio of mean is not used as there is no value to apply the ratio to in the previous data. So trimmed mean is used to impute.

16 Checking! Important to check output of the ratio values If ratio is big – means that there is a big difference between the aggregated total for the current period and the previous. Check data – is a big companies data missing, incorrect units?

17 Copy forward previous period Used to move closing balances form the previous period into the opening balances for the new period. Method If the respondent has not completed a question then the system looks for data in a previous period. If it finds data for the missing question then the data of the previous period will be copied forward. If no value is there then it will calculate the median value

18 Copy forward previous period Current period Previous period Company ACountryIndustryValue 1US6922 2US61010 3FR59050 Company BFR 62040 Company ACountryIndustryValue 1US6922 2US61010 3 FR590 50 Company BFR 620? 40

19 Median Imputation Used to impute for the Ordinary dividends questions Method Orders the question values by size and then counts the number of observations for the question and indentifies the middle number (median). The system imputes the blank cell with the value that is the middle of all the observations

20 Example for Median Imputation 2461015203346475152 2046610524331525147 2461015203346475152 Data for question 3272

21 What is an outlier? Non-typical, unusual or extreme (large or small) values, relative to the rest of the data Outliers can be –non-representative - one-off values (often errors) –representative - there are similar values in the population

22 22 Why do outliers occur ? The ‘shape’ of the population –skewness –large variability Problems with the frame and sample design –misclassifications –poor relationship between stratification and survey variables Errors –data capture error –response error

23 Outliering methods used in FDI 1)Distance from the Mean – trims the data according to the set number of standard deviations. 2) Winsorisation - an outliering process used to identify responses that are different to other responses within its group. These data points are then amended prior to implementing other processes to values that are deemed to be within an acceptable range.

24 Distance from the Mean A unit is an outlier if: y i is outside the tolerance interval Where = trimmed sample mean s = sample standard deviation Outlier is excluded from estimation

25 Winsorisation Assumption: – Sampled outliers are true values, not necessarily unique in the population Method: –Identify outliers –decrease the values of sampled outliers that seem “too high” – non-outliers remain unchanged

26 One-Sided Winsorisation 26 k 0 y

27 How Winsorisation is calculated In the case of an expansion estimator, the optimal cut-off is calculated as L is a Winsorisation parameter: – computed from past data – minimising the Mean Square Error of the estimator – needs to be updated regularly (by Methodology)

28 Winsorisation Use a trimmed mean to ensure robustness FDI uses model based estimation (more on this to follow), no outliers weights are required. Reduce the value of the outlier to k-value and include reduced value in estimation

29 Main differences between the methods Distance from the mean excludes data Winsorisation does not exclude data points but alters the value to bring it closer to the mean.

30 Winsorisation One-sided winsorisation only outliers large positive values Some questions can contain positive and negative numbers – need to split these data into two parts - positive data - negative data – this is absoluted to remove negatives Winsorisation then applied to both parts before data is recombined.

31 Application of methods In the first instance Distance from the Mean will be used for outliering all data. Once we have a better understanding of the new data coming in then Winsorisation will be turned on in the system.

32 What is estimation? A method of deriving values for companies who weren’t sampled Ensures an overall data output can be provided for the population Only applied to stratum that are not fully enumerated

33 Method applied Weighted stratum mean Apply the mean for each stratum and question group to every non sampled business That’s it !

34 Aggregation Population is then added up over industry and country groups to produce final set of results

35 Primary Disclosure 3 rules are applied to test for disclosive data If a value passes one of these rules then value is disclosive and is suppressed Rules – 1.If < 3 wowentrefs within a cell 2.If largest value > 91% of total within a cell 3.If less than 19 RUs within a cell and (total value – (largest + 2 nd largest value) < 0.1 * largest value

36 Secondary Disclosure Further suppression is required as it is possible to recalculate some of the suppressed values within a group if only one value has been suppressed

37 Example Can calculate Malta as EU total – all other countries Malta = 1403

38 Example Can calculate Malta as EU total – all other countries Malta = 1403 Suppress a 2 nd country to hide Malta’s value


Download ppt "FDI - Imputation. Overview Introduction Overview of Imputation Methods Overview of Outliering methods Overview of Estimation methods Aggregation Disclosure."

Similar presentations


Ads by Google