Presentation is loading. Please wait.

Presentation is loading. Please wait.

The development of Cascot: Computer Aided Structured Coding Tool

Similar presentations

Presentation on theme: "The development of Cascot: Computer Aided Structured Coding Tool"— Presentation transcript:

1 The development of Cascot: Computer Aided Structured Coding Tool
Rob Jones Institute for Employment Research University of Warwick Introduction Brief History How & When I became involved. (Started last sept.)

2 Project History Two programs Development Undertaken
Casoc (SOC 2000 coding) Casic (SIC 92 coding) Undocumented, monolithic, legacy dos code. Development Undertaken Testing framework Modularisation (Object Orientation) Integration of Casoc and Casic into one. New GUI

3 Current Status Single program, “Cascot” Single scoring engine
Capable of SOC 2000 & SIC 92Coding Any Classification. Single scoring engine Loadable classifications Structure Index Rules Optional interfaces: web page & desktop application.

4 Classification: Structure
Nature of the classification Example. SOC 2000: 4 levels, Code & Title 1 Managers and Senior Officials 11 Corporate Managers 111 Corporate Managers And Senior Officials 1111 Senior officials in national government 1112 Directors and chief executives of major organisations 1113 Senior officials in local government 1114 Senior officials of special interest organisations 112 Production Managers 1121 Production, works and maintenance managers

5 Classification: Index
Series of texts associated with given codes. 2312 Teacher (educational establishments: college of education) 2312 Teacher (further education) 2312 Teacher (higher and further education) 2312 Teacher (tertiary college) 2312 Teacher, dance (further education) 2312 Teacher, music (further education) 2312 Teacher; head (educational establishments: further education 2312 Tutor (further education) 2312 Tutor (higher and further education) 2313 Adviser (education)

6 Classification: Rules
Abbreviations Eg. deli = delicatessen Misspellings Eg. taylor = tailor Thesaurus Alternatives Eg. cook = (95%) chef Default values Eg. BUSINESS MANAGER = company manager Non concluding text Eg. Owner

7 Classification: Rules
Downgraded Words Eg. Trainee, Assistant, Senior Noise Words Eg. and, of, with, in, at, the Noise Phrases Eg “My Mother is”

8 Principals of operation
Identify words. Select all codes where those words are used. Score all index entries in all those codes. Score comprised of Global component Record component 2 way comparison (Text-2-Index & Index-2-Text) Final Score (0-100) known as 'Certainty Score' This is a simplified overview of the coding

9 Complexities of Scoring
Rules Create alternatives. Non concluding texts (in rules) => 39 Words are 'Pseudo Matched' before being searched for Eg. miner matches mine, miner, mineral, minerals,mines Final score adjusted by next closest score The coding is infact a lot more complex that the previous overview might lead you to beleive. Some of the complexities include

10 Automatic & Assisted Modes
File Input/Output Threshold level = certainty score Assisted mode score < threshold : user prompted Automatic mode score < threshold : No code written In addition to coding individual texts it is possible to run Cascot so as to block code files containing 1,000s of texts. This can be done in two modes: ..... The difference is Automatic will do everything ..... Assisted will prompt when appropriate.

11 Performance Can be measured in many ways.
Speed, Throughput, Accuracy, Speed: Approx 1,000 texts / minute. Main Test Data LFS 96/97 Total Records : 63251 Compared to manual coding.

12 Automatic text processing (SOC 2000): Throughput and error rates by certainty score

13 Automatic text processing (SIC 92) throughput and error rates by certainty score

14 The relationship between matching at SOC2000 unit group level and the certainty score
% matching at each value of certainty score Certainty Score

15 Future Work Classification Editor Performance enhancements
Editing of rules Editing of structure, index entries Creation of new classifications Performance enhancements Integrated spell checker Integration of SOC & SIC coding (Output of SOC coding influenced by SIC code)

16 Cascot Demonstration Highlight sections of GUI show change SOC/SIC
enter some text, results, highlight show load DLHE

17 Cascot Website Cascot freely available over the web
Desktop version (for high volume use) coming soon. Please register on the website.

Download ppt "The development of Cascot: Computer Aided Structured Coding Tool"

Similar presentations

Ads by Google