Presentation on theme: "Big Data.vs. Official Statistics Yu gyung Kang Director, Statistical Information Portal Division Statistics Korea Directors General of the National Statistical."— Presentation transcript:
Big Data.vs. Official Statistics Yu gyung Kang Director, Statistical Information Portal Division Statistics Korea Directors General of the National Statistical Institutes Meeting 25~27 September 2013/Hague, Netherlands
Contents Technology Assessment (TA) in Korea Big Data Use in Private Sector Market Analysis Suicide Warning System On-going Projects by KOSTAT Pilot Project for Mining and Manufacture Survey E-household Account System Pilot Project for Price Statistics Future Challenges 1
Technology Assessment (1) …Conducted by MSIP of Korea in 2012, under the Article 14 of the Framework Act on Science and Technology What is big data? – Data with 3Vs characteristics + Data Management Technology * Gartner’s 3Vs : Volume, Variety and Velocity VolumeVarietyVelocity ……. GB/TB PB EB ZB Structured Data Unstructured Data Customer Data Sale Data Stock Data Finance Data VideoMusic Messages SNS GPS BBS Low speed (hours to weeks) High speed (mins. to seconds) 2
Technology Assessment (2) Expected Impact Private Sector Public Sector Individuals source of new value creation Supporting efficient decision-making Providing business chances and jobs Improving public service and its efficiency Real-time response to social issues Creating new industry and job opportunities Improving quality of life with individually tailored service Increasing trust in public policies and service Aggravating economic inequality Possibility of wasting money due to careless massive investment Social problems caused by unethical use of data Increasing risk of leaking gov’t’s secrets ‘Big Brother’ Misuse of big data with error and its negative impact to gov’t policies Increase of privacy and security issues 3
Technology Assessment (3) Policy Recommendations a.Localize Core Technologies related to big data through gov’t- led R&D b.Establish Legal and Institutional Basis for standardization of managing, sharing and trading big data c.Foster pool of Big Data Analysts and Experts through interdisciplinary undergraduate and graduate programs d.Take a Step-By-Step Approach by Setting Priorities in the sectors where benefits to the public will be visible. e.Make Strategies to Protect Privacy 4
Big Data Use in Private Sector Case 1 : Market Analysis by 5 Which Business would you like to open?
Big Data Use in Private Sector Case 1 : Market Analysis by Floating Population Consumer Type Sales Information Real Estate Business Cycle 6 Real Estate 411 … Korean Statistical Information Service
Big Data Use in Private Sector Case 2 : Suicide Warning System Weather Forecast 7 Why not Suicide forecast? social factors weather factors Werther Effect personal emotion OECD (2012), OECD Health Statistics
Case 2 : Suicide Warning System Big Data Use in Private Sector Training Set ( ) & Test Set (2010) –Total number of suicide incidents –Economic and weather data CPI, unemployment rate, KOSPI(Korean Composite Stock Price Index), daylight hours and temperature – 150 million posts from about 5 million blogs on NAVER(incl. SNS posts) Var1 (# of posts including “suicide”), Var2 (# of posts including “dysphoria”, “be tired”, “be painful”, or “be exhausted”) Model –Dependent Variable : No. of suicide in a given period(3 days) –Independent Variables CPI, unemployment rate, KOSPI, daylight hours, temperature Two variables obtained from the Posts Celebrity suicide (control variable) No. of suicide from the previous period 8
What should NSOs do? Challenge! Established theoretical basis Representativeness of target population Relatively slow Expensive data collection Quantity beats quality Lack of representativeness of target population MORE TIMELY Data already there 9
KOSTAT tried… October 2012~March 2013 Organizes seminars once or twice a month inviting outside big data experts Aims to raise awareness of big data and its impact on producing official statistics December 2012~April 2013 A pilot project on the use of big data in the process of editing existing national statistics Using media data for examining outliers when producing the Index of Industrial Production(IIP) 10
KOSTAT is doing… 1. E-Diary System(household Account System) Currently about 48.5% of sample household adopted the e-Diary system Respondents can import their expenditure information through online transactions from the banks, credit card companies and major retail stores. using big data for the convenience of respondents 11
KOSTAT is doing… KOSTAT is currently preparing for a pilot project on compiling price index using big data for a specific manufacturing product. 2. Pilot Project of Price Index Please select specific domains(or items) that can clearly show difference between big data and existing statistics i.e. TV or electronic products Prof. Roberto Rigobon 12
Future Challenges Can we ignore Big data just because of its representativeness issue in spite of its strengths like timeliness? Can KOSTAT disallow over 380 statistical agencies to produce official statistics with big data? 13 Maybe Not! Shall make use of big data in producing statistics at some point in the future as it was the case with transition to administrative data from survey data. Need to identify the limitations of big data through pilot projects and learn techniques and know how to refine big data based statistics for official statistics.