Statistical Office of the Republic of Slovenia

Slides:



Advertisements
Similar presentations
Paul Smith Office for National Statistics
Advertisements

Olav ten Bosch MSIS, Dublin, April 2014 On the use of internet robots for official statistics.
Editing and Imputing VAT Data for the Purpose of Producing Mixed- Source Turnover Estimates Hannah Finselbach and Daniel Lewis Office for National Statistics,
Building Up a Real Sector Confidence Index for Turkey Ece Oral Dilara Ece Türknur Hamsici CBRT.
Copyright © 2008 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Managerial Economics, 9e Managerial Economics Thomas Maurice.
Quantitative Business Forecasting Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.
E&I of administrative data used for producing business statistics Vera Costa, Frances Krsinich, Rudi Van der Mescht 2008 UNECE Work Session on Statistical.
© 2003 Prentice-Hall, Inc.Chap 12-1 Business Statistics: A First Course (3 rd Edition) Chapter 12 Time-Series Forecasting.
Linking administrative and survey data - employment variable for enterprises and establishments in Finnish Business Register Jaakko Salmela Statistics.
© 2002 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers using Microsoft Excel 3 rd Edition Chapter 13 Time Series Analysis.
Work Package 5: Integrating data from different sources in the production of business statistics Daniel Lewis Office for National Statistics (UK)
Chapter SixCopyright 2009 Pearson Education, Inc. Publishing as Prentice Hall. 1 Chapter 6 The Theory and Estimation of Production.
Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE.
Investments in Higher Education and the Economic Performance of OECD Member Countries Faculty of Architecture & Town Planning Technion – Israel Institute.
Eurostat Overall design. Presented by Eva Elvers Statistics Sweden.
A Strategy for Prioritising Non-response Follow-up to Reduce Costs Without Reducing Output Quality Gareth James Methodology Directorate UK Office for National.
The application of selective editing to the ONS Monthly Business Survey Emma Hooper Office for National Statistics
Copyright © 2005 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Managerial Economics Thomas Maurice eighth edition Chapter 4.
9 th Euroindicators Working Group Luxembourg, 4 th & 5 th December 2006 Eurostat - Unit D1 Key Indicators for European Policies.
Developing and applying business process models in practice Statistics Norway Jenny Linnerud and Anne Gro Hustoft.
Implementation of NACE rev.2 in short –term economic statistics: what did we do in practice? Leendert Hoven Statistics Netherlands presentation prepared.
Big Data activities at SURS Statistical Office of the Republic of Slovenia DIME/ITDG meeting, February 2016.
Class 13, October 15, 2015 Lessons 2.7 & 2.8.  By the end of this lesson, you should understand that: ◦ Each statistic—the mean, median, and mode—is.
Evaluating the benefits of using VAT data to improve the efficiency of editing in a multivariate annual business survey Daniel Lewis.
Predicting Post-Operative Patient Gait Jongmin Kim Movement Research Lab. Seoul National University.
ORGANISATION FOR ECONOMIC CO-OPERATION AND DEVELOPMENT ORGANISATION DE COOPÉRATION ET DE DEVELOPMENT ÉCONOMIQUES OECDOCDE Structural Business Statistics.
1 A latent information function to extend domain attributes to improve the accuracy of small-data-set forecasting Reporter : Zhao-Wei Luo Che-Jung Chang,Der-Chiang.
Stats Methods at IC Lecture 3: Regression.
Regression and Correlation of Data Summary
Chapter 3: Cost Estimation Techniques
WEB SCRAPING FOR JOB STATISTICS
Theme (i): New and emerging methods
Principles and Worldwide Applications, 7th Edition
Basic Estimation Techniques
ESSNet Pilot: Web Scraping for Job Vacancy Statistics
WP1: Web scraping Job Vacancies- ELSTAT
Redesigning French structural business statistics, using more administrative data ICESIII, Montréal, june 2007.
Determining How Costs Behave
SOCIAL NETWORK AS A VENUE OF PARTICIPATION AND SHARING AMONG TEENAGERS
Statistics for Managers using Microsoft Excel 3rd Edition
Rudi Seljak, Aleš Krajnc
Session D12: Multisource statistics New sources: new modelling approaches Author: Gras Fabrice, Eurostat, unit B1, Methodology and corporate architecture.
ESSnet Big Data Dissemination Workshop, Sofia
WP8 Methodology (SGA2) Piet Daas NL, AT, BG, IT, PT, PL, SL.
Basic Estimation Techniques
Henri Luomaranta, Statistics Finland
L. Isella, A. Karvounaraki (JRC) D. Karlis (AUEB)
Big Data Econometrics: Nowcasting and Early Estimates
Macroeconomic heatmap taking the temperature of the Estonian economy
Improving the efficiency of editing in ONS business surveys
Dissemination Workshop ESSnet Big Data Sofia, February 2017
ESSNet Pilot: Web Scraping for Job Vacancy Statistics
The computation of the first estimates
Tomaž Špeh, Rudi Seljak Statistical Office of the Republic of Slovenia
Pieter Vlag senior statistical researcher
United Nations Statistics Division
Agency for statistics of Bosnia and Herzegovina
1/18/2019 ST3131, Lecture 1.
The compilation of turnover and wage and salary indices
Big Data ESSNet WP 1: Web scraping / Job Vacancies Pilot
Use of monthly tax return data
STATISTICS KAZAKHSTAN’S RESPONSE TO CHANGES IN ECONOMIC SITUATION MESHIMBAYEVA ANAR CHAIRPERSON, AGENCY OF THE REPUBLIC OF KAZAKHSTAN ON STATISTICS.
WP 6 Combining big data: early estimates
International Seminar of Early Warning and Business Cycle Indicators
ANALYSIS OF POSSIBILITY TO USE TAX AUTHORITY DATA IN STS. RESULTS
Mapping Data Production Processes to the GSBPM
Principal Component Analysis
Road Sensor Data Marco Puts
Exhibit 12.7 Among Firms That Offer Health Benefits and Provide Employees the Opportunity to Complete a Health Risk Assessment, The Percentage of Firms.
New Editing Methods at Statistics Sweden
Presentation transcript:

Statistical Office of the Republic of Slovenia Early estimates Manca Golmajer Statistical Office of the Republic of Slovenia 13 October 2016

ESSnet on Big Data: WP6: Early estimates Aim: Investigate multiple data sources (big data, official statistical data, administrative data, etc.). Use combined data sources to create early estimates for statistics. Describe the process for the most promising combinations.

Overview of possible sources to be investigated Big Data Registers and existing sources Surveys Job vacancies adds from job portals Statistical Register of Employment Turnover data from various short-term surveys Traffic loops Data from the Employment Agency Consumer confidence index Social media data (Twitter, Facebook, etc.) Tax data Business tendency Supermarket scanner data Wages and salaries … News feeds/messages

Nowcasting turnover indices One of the pilots that was started in WP6. Statistics Finland (Henri Luomaranta et al.) Interesting methodological suggestions for estimating early economic indicators → SURS decided for testing starting with this idea. Modelling isn‘t new, but it is very often used in connection with big data sources. Modelling is very useful for estimating early economic indicators based on many different data sources.

Model (1) Input 1: time series of interest (aggregate data) time TSI 109.64 2008M02 113.51 2008M03 116.23 … 2015M12 95.78

Model (2) Input 2: time series of enterprise data (microdata) time … P973 2008M01 3526 214 66519 2008M02 4252 332 36012 2008M03 4111 411 52447 2015M12 5241 412 71025

Model (3) Model: 2 stages: 1. Principal component analysis (PCA) - dimensionality reduction - time series of enterprise data → standardize → choose the first few principal components 2. Linear regression - Y (dependent variable): time series of interest, e.g. turnover index - X1, …, Xn (predictors): e.g. the chosen principal components

Model (4) Output: An estimate for the series of interest‘s last point in time: e.g. 2015M12 Others, e.g.: Percentage of variability of the data explained by the chosen principal components Percentage of variability of the time series of interest explained by the chosen linear regression model Mean absolute error of the chosen linear regression model

Model (5) Many possibilities for improving the models: Length of time series Data editing (e.g. imputations) Choice of principal components Additional predictors in linear regression Many issues: Availability of the data Software: RStudio Quality of the model

First results of testing (1) Example 1: Estimation of the last period Time series of interest: Real turnover index in industry Time series of enterprise data: Real turnover of 973 industrial enterprises Data: from 2008M01 to 2015M12 (8 years) Principal component analysis: 33 chosen principal components explain 80.2% of the variability of enterprise data Linear regression: 97.5% of variability of real turnover index in industry is explained Maximum absolute error: 4.94 Mean absolute error: 1.04 Standard deviation of error: 1.32 The last period is 2015M12: Original value: 95.78 Estimate: 97.18 Absolute error: 1.40

First results of testing (2) Example 2: Estimation of the last periods under various conditions Time series of interest: Real turnover index in industry Time series of enterprise data: Real turnover of industrial enterprises Data: from 2008M01 to 2013M01─2015M12 (5─8 years) Principal component analysis: Various conditions for choosing principal components: C1: The chosen principal components explain at least 70% (75%, 80%, 85%, 90%) of variability of enterprise data. C2: Time series in the linear regression model are at least 7 (8, 10, 15, 20) times longer than the number of the chosen principal components. C3: The last chosen principal component explains at least 5% of variability of enterprise data.

First results of testing (3) Conclusions: C1: 14─56 principal components are chosen. More than 96% of variability of real turnover index in industry is explained. The last period: Mean absolute relative error: 1.8%─2.7% Maximum absolute relative error: 5.2%─10.4% The errors are often greater than expected. C2: 3─13 principal components are chosen. More than 88% of variability of real turnover index in industry is explained. The last period: Mean absolute relative error: 2.1%─2.7% Maximum absolute relative error: 5.5%─8.3% C3: not very promising „70%“, „75%“, „7 times“, „8 times“ seem to be the most promising.