- ONS Classification Coding Tools Project Occupation Classification Workshop RSS, London, 21 June 2004 Nigel Swier.

Slides:



Advertisements
Similar presentations
Usage statistics in context - panel discussion on understanding usage, measuring success Peter Shepherd Project Director COUNTER AAP/PSP 9 February 2005.
Advertisements

MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Data Archiving.
Improving imputation methodology in the Hungarian Central Statistical Office (HCSO) NTTS 2009 seminar, Bruxelles February 2009 Improving imputation.
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
Oracle Rally Applications Modernization. 4 June About the Company Founded in 2002 Unites high-level information technology and organization architecture.
Sybase PowerBuilder Applications Modernization. 11 October About the Company Founded in 2002 Unites high-level information technology and organization.
Business Register Outputs in Support of Regional Policy John Perry UK Office for National Statistics.
The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007.
ECM RFP 101 Presented by: Carol Mitchell C.M. Mitchell Consulting.
Case Tools Trisha Cummings. Our Definition of CASE  CASE is the use of computer-based support in the software development process.  A CASE tool is a.
National Institute for Statistics and Geography (INEGI) is, from 2008, an autonomous institute in Technical and Managing matters. According to Mexican.
1 IS112 – Chapter 1 Notes Computer Organization and Programming Professor Catherine Dwyer Fall 2005.
8 Systems Analysis and Design in a Changing World, Fifth Edition.
Supplement 02CASE Tools1 Supplement 02 - Case Tools And Franchise Colleges By MANSHA NAWAZ.
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
Metadata driven application for aggregation and tabular protection Andreja Smukavec SURS.
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
2 Systems Architecture, Fifth Edition Chapter Goals Describe the activities of information systems professionals Describe the technical knowledge of computer.
Improving the Design of UK Business Surveys Gareth James Methodology Directorate UK Office for National Statistics.
SCSC 311 Information Systems: hardware and software.
Geospatial Technical Support Module 2 California Department of Water Resources Geospatial Technical Support Module 2 Architecture overview and Data Promotion.
Principles of Information Systems, Sixth Edition Systems Design, Implementation, Maintenance, and Review Chapter 13.
Software Systems for Survey and Census Yudi Agusta Statistics Indonesia (Chief of IT Division Regional Statistics Office of Bali Province) Joint Meeting.
The Adoption of METIS GSBPM in Statistics Denmark.
Statistics Sweden Results from operations in 2006: 146 publications 356 press releases commissions 3,7 million visitors at
Support for design of statistical surveys at Statistics Sweden
Use of Hierarchical Keywords for Easy Data Management on HUBzero HUBbub Conference 2013 September 6 th, 2013 Gaurav Nanda, Jonathan Tan, Peter Auyeung,
Current and Future Applications of the Generic Statistical Business Process Model at Statistics Canada Laurie Reedman and Claude Julien May 5, 2010.
5 Marzo 2007 Census mapping and Gis Part II: dissemination Fabio Crescenzi Istat, Central Directorate on General Censuses UNECE Training Workshop on Census.
Eurostat Expression language (EL) in Eurostat SDMX - TWG Luxembourg, 5 Jun 2013 Adam Wroński.
Jump to first page (o ns) Modernising Statistical Systems to improve Quality The experiences of the Office for National Statistics (ONS) Presented by Emma.
Statistics New Zealand’s End-to-End Metadata Life-Cycle ”Creating a New Business Model for a National Statistical Office if the 21 st Century” Gary Dunnet.
United Nations Economic Commission for Europe Statistical Division Mapping Data Production Processes to the GSBPM Steven Vale UNECE
Use of Administrative Data Seminar on Developing a Programme on Integrated Statistics in support of the Implementation of the SNA for CARICOM countries.
The ONS statistical modernisation programme – what went right; what went wrong? Stephen Penneck Director General, ONS.
Statistical Expertise for Sound Decision Making Quality Assurance for Census Data Processing Jean-Michel Durr 28/1/20111Fourth meeting of the TCG - Lubjana.
Slide 1 Eurostat Unit B3 – Statistical Information Technologies CoRD Meeting – 4 June 2007 Agenda Item 8 Preliminary ideas for a 2011 census hub Giuseppe.
United Nations Economic Commission for Europe Statistical Division High-Level Group Achievements and Plans Steven Vale UNECE
Process Quality in ONS Rachel Skentelbery, Rachael Viles & Sarah Green
SNA seminar in the Caribbean Integrated questionnaires Marie Brodeur Director General, Industry Statistics Branch, Statistics Canada St. Lucia February,
Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies.
Developing and applying business process models in practice Statistics Norway Jenny Linnerud and Anne Gro Hustoft.
Preparing for A Strategy for Change Based on Previous Experiences Steve Vale Office for National Statistics, UK.
Principles of Information Systems, Sixth Edition 1 Systems Design, Implementation, Maintenance, and Review Chapter 13.
Recent development in the metadata area at Statistics Sweden Klas Blomqvist
MetaPlus Klas Blomqvist Statistics Sweden Research and Development – Central Methods
Integrated metadata systems History Status Vision Roadmap
Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Hannu Saarenmaa EC CHM & GBIF European Regional Nodes Meeting Copenhagen,
Software Reuse Course: # The Johns-Hopkins University Montgomery County Campus Fall 2000 Session 4 Lecture # 3 - September 28, 2004.
RECENT DEVELOPMENT OF SORS METADATA REPOSITORIES FOR FASTER AND MORE TRANSPARENT PRODUCTION PROCESS Work Session on Statistical Metadata 9-11 February.
26 January 2016CountrySTAT Training for the Philippines Introduction to FAOSTAT and CountrySTAT 1 Overview of the FAOSTAT and CountrySTAT Candido J. Astrologo,
Lecture VIII: Software Architecture
Towards the 2011 UK Census Editing Strategy Heather Wagstaff and Steven Rogers Methodology Directorate Office for National Statistics, U.K.
CSO ITSIP Project - implementation of new Data Management System (DMS) ITDG meeting, Luxembourg, October 2006 Presentation by Joe Treacy CSO, Ireland.
5.8 Finalise data files 5.6 Calculate weights Price index for legal services Quality Management / Metadata Management Specify Needs Design Build CollectProcessAnalyse.
Elaborating on the Business Architecture of SN Robbert Renssen Statistics Netherlands Standard Process Steps.
ARCHIBUS, Inc. COBie Data Connectors Gary Siorek, Technical Applications Engineer 2013 COBie Challenge for Facility Managers112-Mar-2013.
How official statistics is produced Alan Vask
CASCOT Editor Ritva Ellison Institute for Employment Research University of Warwick.
1 IT system and data validation process in Latvian CPI/HICP Prepared by Oskars Alksnis, Central Statistical Bureau of Latvia EU Twinning Project Forwarding.
Supplier Recovery Claim Automation
Survey phases, survey errors and quality control system
Generic Statistical Business Process Model (GSBPM)
Survey phases, survey errors and quality control system
Tomaž Špeh, Rudi Seljak Statistical Office of the Republic of Slovenia
Software Systems for Survey and Census
Classification John Perry, UK ONS.
Matching and Industry Coding
Metadata The metadata contains
Mapping Data Production Processes to the GSBPM
Presentation transcript:

- ONS Classification Coding Tools Project Occupation Classification Workshop RSS, London, 21 June 2004 Nigel Swier

- Overview of ONS Coding Tools Project Aim: To select and operationalise a standard tool for assigning classification codes to verbatim text responses given in answer to a question Scope: For all classifications (except ICD10 for cause of death coding), including occupation (SOC) and industry (SIC) Both automatic and interactive coding functionality Development of selected tool into a component so that can be used within the new ONS technical architecture Context: Part of the ONS Statistical Infrastructure Development Project (itself part of the ONS Statistical Modernisation Programme).

- Office for National Statistics ONS formed in 1996 Central Statistics Office Office of Population Censuses and Surveys (OPCS) Employment Department

- Statistical Modernisation Programme (SMP) Inherited Infrastructure: Multiple databases Multiple development tools Proliferation of statistical tools and methods Poor metadata Paper-based dissemination Risky statistical systems ONS vision: Single repository (Oracle) Java (J2EE) Standard statistical tools and methods (e.g. coding tool) Corporate metadata system Web-based dissemination Robust statistical systems £ 75 million to deliver SMP ( )

- Data Collection Survey design Survey case management Statistical Value Chain Operations on Unit Data Editing Imputation Coding Dissemination Operations on Aggregate Data Time series Tabulation Disclosure Control Weighting Estimation ONS Metadata Repository Corporate ONS Repository for Data (CORD) Common ONS Statistical Tools

- Benefits of Statistical Modernisation Robust statistical systems Automated workflow: More rapid publishing of statistical outputs Improved efficiency Improved job satisfaction Data will be a corporate resource. Along with improved metadata it will allow ONS to leverage greater value from data holdings Reduced licencing and IT support costs Reduced staff training costs and easier transferability of staff

- Evaluation criteria Functionality –Automatic and interactive coding –Able to handle simple and complex classifications –Dependent coding Performance (coding/agreement rates) Technical (fit with new ONS technical environment) Supplier support Impact on ONS outputs

- Evaluating and selecting the tool Started (in earnest) January 2003 Establish detailed evaluation criteria Investigate tools and identify a shortlist (ACTR, PDC) Obtain software, preparation of knowledge bases for testing, Preparation of test data Testing (automatic coding performance) Analysis of results Evaluate supplier comments and tool functionality Compilation of scores Final Report (Completed December 2003) => recommendation to select ACTR

- ACTR - the selected tool Automated Coding by Text Recognition Developed by Statistics Canada Used by Lockheed Martin for the Census 2001 Processing System Automatic and interactive coding Consists of coding engine and maintenance tools; customer builds and tunes the coding index Generic: Can code a range of classifications Flexible: Allows different coding strategies, thresholds Has API and has been ported to UNIX/Windows Multiple coding databases Dependent coding using filters Powerful parsing capabilities

- Parsing Manipulation of text using global rules –Normalise, or reduce variation in text –Tune coding application Examples: –Replace/delete string –Replace/delete word, (synonym list) –Delete clause Applied to both reference files (i.e. coding index) and input files. Parsing data + coding index = Knowledge base

- ACTR matching algorithm Matching always follows parsing. Step 1: Find direct matches and assign codes Step 2: Find indirect matches (using Hellerman algorithm) –match scores based on word frequencies across index –unmatched words ignored (although more unmatched words lowers the score) –no fuzzy matching (except through parsing rules) Step 3: Assign codes based on user defined match parameters.

- Building knowledge base for SOC 2000 Based on SOC 2000 index Obtain test/tuning data (Census 1991 recoded descriptions) Development of parsing strategy Iterative development Index partitioned into 2 contexts –Main index entries –Default index

-

-

-

-

-

- ACTR shortcomings Non-linguistic, ignores word order (e.g. Clerk to the Council is not equivalent to Council Clerk) No fuzzy matching (although particular cases of missing spaces and misspellings can be handled through parsing) Longer text strings difficult to code automatically No classifications mapping facility

- Next steps? Short term: Building knowledge bases Medium term: Implementing ACTR in individual business areas: –ASHE (Earnings) for coding occupation in April 2005 –IDBR (Industry) Medium/Long term: Operationalising ACTR in the new ONS environment, including CORD etc.

- The End