The development of Cascot: Computer Aided Structured Coding Tool

Slides:



Advertisements
Similar presentations
Software Re-engineering
Advertisements

IRRA DSpace April 2006 Claire Knowles University of Edinburgh.
EPrints 2.0 / March 4 th 2002 / Glasgow / Chris Gutteridge Introduction to EPrints 2.0 March 4 th 2002 Glasgow Christopher Gutteridge from the Department.
- ONS Classification Coding Tools Project Occupation Classification Workshop RSS, London, 21 June 2004 Nigel Swier.
Configuration management
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
DEVELOPMENT OF CASCOT 5.0 (a multi-language text coding tool) Presentation to the DASISH project meeting, Gothenburg, November 2014 Peter Elias Margaret.
TCN Spell Checker Team AZP: Mark Biddlecom, Joshua Correa, Jatinder Singh, Zianeh Kemeh- Gama, Eric Engquist.
The Cassis Series of Optical Disc Products PTDL Training Seminar Alexandria, VA April 7, 2006.
Cataloging: Millennium Silver and Beyond Claudia Conrad Product Manager, Cataloging ALA Annual 2004.
Classifications and CASCOT Ritva Ellison Institute for Employment Research University of Warwick.
Chapter 7 UNDERSTANDING AND DESIGNING FORMS. Input Forms: Content and Organization Need for forms Event analysis and forms Relationship between input.
Online Access for all POSTGRADUATE DIPLOMA IN PUBLIC COMMUNICATION (New Media) Trinity & All Saints College April 2006 Bim Egan Web Accessibility Consultant.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 8 Slide 1 Software Prototyping l Rapid software development.
Russell Taylor Lecturer in Computing & Business Studies.
CASCOT International version 5 User Guide Peter Elias, Margaret Birch and Ritva Ellison Institute for Employment Research University of Warwick December.
Before class begins… Help us to assess this session and plan for future workshops Please complete the Advanced Refworks Pre-learning assessment at:
Software Self-Testing
Employing e-Portfolios in Instructional and Co-Curricular Settings Jennifer Matthews, Senior Consultant Blackboard Inc April 13, 2005.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.
Collections Management Museums EMu 3.1 / 3.2 – New Features EMu 3.1 / 3.2 New Features Bernard Marshall Chief Technology Officer KE Software.
Open source administration software for education software development simplified KRAD Kuali Application Development Framework.
ArcGIS Workflow Manager An Introduction
Global Update with Confidence Mary M. Strouse Innovative Users Group May 19, 2009.
A VERY USEFUL E-LEARNING TOOL FOR TEACHERS, RESEARCHERS, AND STUDENTS.
Division of Population Health Sciences 1 Royal College of Surgeons in Ireland Coláiste Ríoga na Máinleá in Éirinn Computer-Based Clinical Decision Support.
Using Tools Mark Grabe. Copyright © Houghton Mifflin Company. All rights reserved.3-2 Tool Definition n An object that allows the user to perform tasks.
Indexes/Abstracts Ready Reference Dr. Dania Bilal IS 530 Spring 2002.
The NetBeans IDE CSIS 3701: Advanced Object Oriented Programming.
Introducing Dreamweaver MX 2004
Session 1 – Use of profiling for public administration Linda Scott Head of Business Register Operations UK.
CASCOT for EurOccupations Demonstration of the software English, Dutch, French Manual coding Linking to EurOccupations database Automated coding Specific.
CASCOT AND THE CODING OF OCCUPATIONS IN EUROPEAN SURVEYS Demonstration of CASCOT Presentation for the InGRID Workshop Amsterdam, February 2014 Ritva.
Name of presentation Month 2009 Courses and Programs Database Information Session September 2009.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 8 Slide 1 Software Prototyping l Rapid software development to validate requirements.
CHAPTER TEN AUTHORING.
1 3. Computing System Fundamentals 3.1 Language Translators.
CS 390 Unix Programming Summer Unix Programming - CS 3902 Course Details Online Information Please check.
Current and Future Applications of the Generic Statistical Business Process Model at Statistics Canada Laurie Reedman and Claude Julien May 5, 2010.
Principles of Information Systems, Sixth Edition Systems Design, Implementation, Maintenance, and Review Chapter 13.
Jump to first page (o ns) Modernising Statistical Systems to improve Quality The experiences of the Office for National Statistics (ONS) Presented by Emma.
CASCOT and its coding rules Presentation for DASISH Workshop Venice, April 2014 Ritva Ellison Institute for Employment Research.
RDA Toolkit Demonstration. Overview Accessing the Toolkit Navigating the Toolkit Understanding the functionality of the Toolkit Searching the Toolkit.
Statistical Expertise for Sound Decision Making Quality Assurance for Census Data Processing Jean-Michel Durr 28/1/20111Fourth meeting of the TCG - Lubjana.
C OMPUTING E SSENTIALS Timothy J. O’Leary Linda I. O’Leary Presentations by: Fred Bounds.
Improved Register Data Matching and its Impact on Survey Population Estimates Steve Vale Office for National Statistics, UK.
Processing Hardware, Software. Hardware Hardware Processing is performed by a computer ’ s central processing unit and is measured by the clock speed.
Unclassified//For Official Use Only 1 RAPID: Representation and Analysis of Probabilistic Intelligence Data Carnegie Mellon University PI : Prof. Jaime.
Chapter – 8 Software Tools.
PART 1: Introduction to HTML & CSS. Lecture 1: HTML 5 Basic Structure.
Project Undertaken By, Anita.K Subalakshmi.S Suseela.J.S Guide: Mrs.M.J.Jeyasheela Rakkini AP/CSE Third Review.
CASCOT Editor Ritva Ellison Institute for Employment Research University of Warwick.
1 RDA in Aleph 500 Version 21 Yoel Kortick. Aleph support manager Nov
Architecture Review 10/11/2004
Software Prototyping.
Create Rubrics for your Project-Based Learning Activities.
Accounting Information Systems with ERP Applications
Software Maintenance.
Accounting Information Systems with ERP Applications
Maintaining software solutions
OPERATE A WORD PROCESSING APPLICATION (BASIC)
Tools of Software Development
Social Media And Global Computing Introduction to Visual Studio
GRAPHICAL USER INTERFACE
Course: Module: Lesson # & Name Instructional Material 1 of 32 Lesson Delivery Mode: Lesson Duration: Document Name: 1. Professional Diploma in ERP Systems.
Classification John Perry, UK ONS.
1 Word Processing Part I.
Improved Register Data Matching and its Impact on Survey Population Estimates Steve Vale Office for National Statistics, UK.
Presentation transcript:

The development of Cascot: Computer Aided Structured Coding Tool Rob Jones Institute for Employment Research University of Warwick Introduction Brief History How & When I became involved. (Started last sept.)

Project History Two programs Development Undertaken Casoc (SOC 2000 coding) Casic (SIC 92 coding) Undocumented, monolithic, legacy dos code. Development Undertaken Testing framework Modularisation (Object Orientation) Integration of Casoc and Casic into one. New GUI

Current Status Single program, “Cascot” Single scoring engine Capable of SOC 2000 & SIC 92Coding Any Classification. Single scoring engine Loadable classifications Structure Index Rules Optional interfaces: web page & desktop application.

Classification: Structure Nature of the classification Example. SOC 2000: 4 levels, Code & Title 1 Managers and Senior Officials 11 Corporate Managers 111 Corporate Managers And Senior Officials 1111 Senior officials in national government 1112 Directors and chief executives of major organisations 1113 Senior officials in local government 1114 Senior officials of special interest organisations 112 Production Managers 1121 Production, works and maintenance managers

Classification: Index Series of texts associated with given codes. 2312 Teacher (educational establishments: college of education) 2312 Teacher (further education) 2312 Teacher (higher and further education) 2312 Teacher (tertiary college) 2312 Teacher, dance (further education) 2312 Teacher, music (further education) 2312 Teacher; head (educational establishments: further education ....... 2312 Tutor (further education) 2312 Tutor (higher and further education) 2313 Adviser (education)

Classification: Rules Abbreviations Eg. deli = delicatessen Misspellings Eg. taylor = tailor Thesaurus Alternatives Eg. cook = (95%) chef Default values Eg. BUSINESS MANAGER = company manager Non concluding text Eg. Owner

Classification: Rules Downgraded Words Eg. Trainee, Assistant, Senior Noise Words Eg. and, of, with, in, at, the Noise Phrases Eg “My Mother is”

Principals of operation Identify words. Select all codes where those words are used. Score all index entries in all those codes. Score comprised of Global component Record component 2 way comparison (Text-2-Index & Index-2-Text) Final Score (0-100) known as 'Certainty Score' This is a simplified overview of the coding

Complexities of Scoring Rules Create alternatives. Non concluding texts (in rules) => 39 Words are 'Pseudo Matched' before being searched for Eg. miner matches mine, miner, mineral, minerals,mines Final score adjusted by next closest score The coding is infact a lot more complex that the previous overview might lead you to beleive. Some of the complexities include

Automatic & Assisted Modes File Input/Output Threshold level = certainty score Assisted mode score < threshold : user prompted Automatic mode score < threshold : No code written In addition to coding individual texts it is possible to run Cascot so as to block code files containing 1,000s of texts. This can be done in two modes: ..... The difference is Automatic will do everything ..... Assisted will prompt when appropriate.

Performance Can be measured in many ways. Speed, Throughput, Accuracy, Speed: Approx 1,000 texts / minute. Main Test Data LFS 96/97 Total Records : 63251 Compared to manual coding.

Automatic text processing (SOC 2000): Throughput and error rates by certainty score

Automatic text processing (SIC 92) throughput and error rates by certainty score

The relationship between matching at SOC2000 unit group level and the certainty score % matching at each value of certainty score Certainty Score

Future Work Classification Editor Performance enhancements Editing of rules Editing of structure, index entries Creation of new classifications Performance enhancements Integrated spell checker Integration of SOC & SIC coding (Output of SOC coding influenced by SIC code)

Cascot Demonstration Highlight sections of GUI show change SOC/SIC enter some text, results, highlight show load DLHE

Cascot Website Cascot freely available over the web www.warwick.ac.uk/go/cascot Desktop version (for high volume use) coming soon. Please register on the website.