DEVELOPMENT OF CASCOT 5.0 (a multi-language text coding tool) Presentation to the DASISH project meeting, Gothenburg, 27-28 November 2014 Peter Elias Margaret.

Slides:



Advertisements
Similar presentations
OAF - Workshop, Lisbon, Dec Open Access to Libraries MALVINE and LEAF. Perspectives of the Open Archives Initiative Protocol for Metadata Harvesting.
Advertisements

- ONS Classification Coding Tools Project Occupation Classification Workshop RSS, London, 21 June 2004 Nigel Swier.
Occupational coding: principles, practice and problems A workshop within the ESRC Research Methods Programme Peter Elias Institute for Employment Research.
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
DPM ARCHITECT FOR XBRL XBRL taxonomy editor aimed at BUSINESS USERS Based on the DPM approach and DPM XBRL Architecture Currently on its last stage of.
Meeting of the United Nations Expert Group on International Economic and Social Classifications (UNEG) (New York, 20 – 24 June 2005) Some points drawn.
Occupations in ESS R1-R51 Coding and Scaling Occupations in ESS R1-R5 Harry B.G. Ganzeboom Ingrid Workshop, UvA Amsterdam, February
WP 3: Survey Quality Eric Harrison City University London Knut Kalgraff Skjåk Norwegian Social Science Data Services IASSIST May May, Cologne.
CASCOT AND THE CODING OF OCCUPATIONS IN EUROPEAN SURVEYS Presentation for the INGRID Workshop Amsterdam, February 2014 Margaret Birch Institute for.
ESSnet STAND-PREP Work Package 2. WP2: Aim Systemise standards other than statistical methods and examine issues in the adoption of standards. Consider.
Classifications and CASCOT Ritva Ellison Institute for Employment Research University of Warwick.
Data Service Infrastructure for the Social Science and the Humanities (DASISH): Improving Survey Quality in Cross-national Research Eric Harrison City.
CASCOT International version 5 User Guide Peter Elias, Margaret Birch and Ritva Ellison Institute for Employment Research University of Warwick December.
Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
Data can tell us stories New approaches for an effective e-dissemination of statistics Enrico Giovannini* *ISTAT - Italian National Institute of Statistics.
Australian Partnership for Sustainable Repositories AUSTRALIAN PARTNERSHIP FOR SUSTAINABLE REPOSITORIES Caul Meeting 2005/1 Auckland 4.
Skills and occupational needs: the Occupational Information System in Italy Giovanni Castiglioni Università Cattolica del Sacro Cuore - Milano
Leuven, Computer Aided Document Indexing System for Accessing Legislation A Joint Venture of Flanders and Croatia Bojana Dalbelo Bašić Faculty.
EGM presentation prepared by ILO Updating ISCO Process.
DEVELOPMENT OF CASCOT 5.0 (the multi-language version) Presentation for the Venice Workshop April 2014 Margaret Birch Institute for Employment Research.
Multilateral Project for Transfer of Innovation Project Duration: 24 months Partners’ Kick-off Meeting, November 2009, Sofia Tanya Pancheva/University.
Vicky Piert SIMS Team v0.9 Helping to support School Census Autumn Helping to support School Summer Census Secondary.
Coding of parental occupations ICCS Marker Training Hamburg, July 2007.
JRC-Ispra, , Slide 1 Next Steps / Technical Details Bruno Pouliquen & Ralf Steinberger Addressing the Language Barrier Problem in the Enlarged.
ISCO-08 - Current Status and plans to support implementation David Hunter Department of Statistics International Labour Office United Nations Expert Group.
CASCOT for EurOccupations Demonstration of the software English, Dutch, French Manual coding Linking to EurOccupations database Automated coding Specific.
CASCOT AND THE CODING OF OCCUPATIONS IN EUROPEAN SURVEYS Demonstration of CASCOT Presentation for the InGRID Workshop Amsterdam, February 2014 Ritva.
FIIT STU Bratislava Classification and automatic concept map creation in eLearning environment Karol Furdík 1, Ján Paralič 1, Pavel Smrž.
Recent Developments of the OECD Business Tendency and Consumer Opinion Surveys Portal coi/coordination
Designing and Using a Behaviour code frame to assess multiple styles of survey items Alice McGee and Michelle Gray.
Harmonisation across countries in SHARE Workshop on Harmonisation of Social Survey Data for Cross-National Comparison Prague 19.
TIPEIL LLP-LDV/TOI/07/IT/019 First Transnational Workshop Athens, 7-8 February 2008 Venue: IEKEP - 34A Averof str. in Nea Ionia 1 st floor DISSEMINATION/VALORISATION.
An European index of occupations. Name of the presentation Job titles Job title is the usual designation given to a person doing a specific job. There.
Use of Administrative Data Seminar on Developing a Programme on Integrated Statistics in support of the Implementation of the SNA for CARICOM countries.
Copenhagen, 6 June 2006 EC CHM Multilinguality Anton Cupcea Finsiel Romania.
Conference on Data Quality for International Organisations, Newport, April Assessment of statistical data quality: The example of the Occupational.
CASCOT and its coding rules Presentation for DASISH Workshop Venice, April 2014 Ritva Ellison Institute for Employment Research.
LIFELONG GUIDANCE SYSTEMS: COMMON EUROPEAN REFERENCE TOOLS ELGPN PEER LEARNING ACTIVITY WP2 Prague April 2008 Dr John McCarthy, Director International.
Surveillance is The First Step to Solving the Problem.
ICCS Marker Training Hamburg July 2008 Final note on marking Reliability marking report from WinDEM will include record of scores for double-marked items,
Country: Mauritius Manufacturing and Services Development and Competitiveness Project CROSS-COUNTRY WORKSHOP FOR IMPACT EVALUATIONS IN FINANCE AND PRIVATE.
Learning Objectives Understand the concepts of Information systems.
Skills for the future The role of occupations in the skills supply and demand forecasts Vladimir Kvetan Cedefop InGRID Expert workshop New skills new jobs:
13-Jul-07 State of the art of the ISCO-08 implementation.
ESSnet project "Automated data collection and reporting in accommodation statistics" Objectives, achievements and results Köln,
Building Capacities for Establishment of Social Science Digital Data Archives Aleksandra Bradić-Martinović, Institute of Economic Sciences, Belgrade Achievements.
State of play and plans by variable Occupation. 2 Policy needs for comparable data on occupations  Indicators on gender segregation used in the follow.
1 STATISTICAL DATA ANALYSIS SOFTWARE By Johnson Lubega Kagugube Director, District Statistics and Capacity Development Uganda Bureau of Statistics.
What do we know from research on:. Key points Digital games for learning have some distinctive features (see slide 3) Digital games for learning can have.
GBIF NODES Committee Meeting Copenhagen, Denmark 4 th October 2009 The GBIF Integrated Publishing Toolkit Alberto GONZÁLEZ-TALAVÁN Programme Officer for.
Technical Assistance and Information Exchange Instrument Mariana ARIAS Institution Building, TAIEX, Twinning DG Neighbourhood and Enlargement Negotiations.
CASCOT Editor Ritva Ellison Institute for Employment Research University of Warwick.
Privacy and ‘Big Data’: the European perspective Human Subjects’ Protections in the Digital Age: IRB, Privacy and Big Data Peter Elias, University of Warwick.
What do we know from research on:
Measuring occupations: respondent’s self- identification from a large database SPECIAL SESSION: Synergies for Europe’s Research Infrastructures in the.
WP 3: Data Quality Alexia Katsanidou
Data collection on occupations in the Netherlands
ESSnet project "Automated data collection and reporting in accommodation statistics"   Objectives, achievements and results
Computer Aided Document Indexing System for Accessing Legislation A Joint Venture of Flanders and Croatia Bojana Dalbelo Bašić Faculty of Electrical Engineering.
Planning, Monitoring and Evaluation System M E S
Session 8 Data Processing
Workshop on the data collection of occupational data 28 November 2008
Coding occupations The new coding process Sue Westerman, Marc Houben.
Spreadsheets, Modelling & Databases
Commission Activities Eurostat : Latest developments
Consistency between the directive 2005/36/EC and ISCO/08
The Estonian experience with ex-ante evaluation – set-up and progress
Grants for the implementation of ISCO 08 during 2010
Measuring the very long, fuzzy tail in the occupational distribution
Presentation transcript:

DEVELOPMENT OF CASCOT 5.0 (a multi-language text coding tool) Presentation to the DASISH project meeting, Gothenburg, November 2014 Peter Elias Margaret Birch and Ritva Ellison Institute for Employment Research

What are the problems with occupation coding? Occupation is a standard measure on most national social surveys (ISCO’08) Not straightforward to collect and in non-standard (unstructured) form Requires harmonisation to (max) four-digit classification Requires specialist knowledge and expertise to code accurately

Computer Assisted Structured Coding Tool CASCOT Software tool for coding text automatically or manually, or mixing the two modes. Developed at the Institute for Employment Research at University of Warwick Used by over 100 organisations (public research, private sector, statistical agencies) Fast with a sophisticated coding engine Desktop version, API available: see

Enter text (could be from a file) Selected classification Output can be directed to a file

IER contracted within DASISH to develop a multilingual version of CASCOT to code job titles to ISCO 08 (WP3) CASCOT upgraded to provide: a user interface which is presented in up to 3 selected European languages; classification files which permit coding of text in selected languages to the appropriate national occupational classification and to ISCO’08 at four digits; a software tool which will facilitate evaluation of coded text files. Upgraded to facilitate future extension by incorporating additional languages as and when relevant index material becomes available.

DASISH: CASCOT development User interface in 8 languages: Dutch, English, French, German, Italian, Portuguese, Slovak and Spanish ISCO 08 classification (structure, index) prepared for each country Simultaneous coding into ISCO 08 and national occupational code possible Development of CASCOT Performance Tool Raw data files from the European Social Survey (ESS) Round 6 used for validation Partnership arrangements for the testing and fine-tuning by experts within each country

Collaboration and Training DASISH WP3 Group meeting, London, April 2012 DASISH Quantitative Workshop, Mannheim, December 2012 GESIS Workshop on Developments in Occupational Coding, Mannheim, February 2013 Meeting with University Ca’ Foscari to discuss development of Italian version, Venice, August 2013 InGRID workshop on ‘Tools for harmonising the measurement of occupations’, Amsterdam, February 2014 CASCOT Training Workshop, Venice, April 2014 Users currently trialling/purchasing CASCOT v5.0

Cascot Performance Tool Allows the user to analyse the performance of Cascot by comparing data coded to ‘Gold Standard’ with code produced by Cascot for the same data. The Tool displays a Performance Graph, Summary and Interactive statistics. Enables the user to decide how much text should be coded automatically and what is left for (labour-intensive) human intervention.

Research project: Health, Occupation, and the role of Measurement Error (HOME) A project designed to extend previous work (see poster in this room): Belloni, M., Brugiavini, A., Meschi, E. and Tijdens, K. (2014). Measurement error in occupational coding: an analysis on SHARE data. Universiteit van Amsterdam, AIAS Working Paper HOME was set up during visit to AIAS-UvA (Kea Tijdens) - thanks to an InGRID visiting grant Access to restricted ELSA files has been funded by DASISH

Aim and steps of HOME 1. Recode open-ended answers from ELSA using CASCOT 2. Evaluate agreement rates between ELSA and CASCOT 3. Convert ISCO codes on job features that may affect health:  whether the job is physically demanding,  psychosocial factors of the job,  exposure to risk factors, etc. This conversion requires further data (and funding…) 4. Estimate models exploring links between individuals’ with individuals’ job characteristics 5. Evaluate whether and to what extent any estimated associations are sensitive to occupational miscoding.

What have we learnt during DASISH? Modifications to CASCOT text matching  Handling of compound words  Treatment for equivalent word endings  Processing/non-processing of spaces Modifications to CASCOT  Add text descriptions to the classification structure to provide more information for the coder The need for ‘Gold Standard’ coded text files

Next steps? Extend the development and testing of language-based coding rules in all relevant languages Continue fine-tuning the software to re-code job title text already coded to ISCO 08 (demonstrated at the Venice training workshop April 2014) NB Resource-demanding, time-consuming for each language. Expertise needed in language and occupation. (English coding rules developed in parallel with CASCOT over a decade or more.) Dependent on further funding.

Further information CASCOT Institute for Employment Research University of Warwick