Presentation is loading. Please wait.

Presentation is loading. Please wait.

The development of Cascot: Computer Aided Structured Coding Tool Rob Jones Institute for Employment Research University of Warwick.

Similar presentations


Presentation on theme: "The development of Cascot: Computer Aided Structured Coding Tool Rob Jones Institute for Employment Research University of Warwick."— Presentation transcript:

1 The development of Cascot: Computer Aided Structured Coding Tool Rob Jones Institute for Employment Research University of Warwick

2 Project History Two programs Casoc (SOC 2000 coding) Casic (SIC 92 coding) Undocumented, monolithic, legacy dos code. Development Undertaken – Testing framework – Modularisation (Object Orientation) – Integration of Casoc and Casic into one. – New GUI

3 Current Status Single program, Cascot – Capable of SOC 2000 & SIC 92Coding – Any Classification. Single scoring engine Loadable classifications – Structure – Index – Rules Optional interfaces: web page & desktop application.

4 Classification: Structure Nature of the classification – Example. SOC 2000: 4 levels, Code & Title 1Managers and Senior Officials 11Corporate Managers 111Corporate Managers And Senior Officials 1111Senior officials in national government 1112Directors and chief executives of major organisations 1113Senior officials in local government 1114Senior officials of special interest organisations 112Production Managers 1121Production, works and maintenance managers

5 Classification: Index Series of texts associated with given codes. 2312Teacher (educational establishments: college of education) 2312Teacher (further education) 2312Teacher (higher and further education) 2312Teacher (tertiary college) 2312Teacher, dance (further education) 2312Teacher, music (further education) 2312Teacher; head (educational establishments: further education Tutor (further education) 2312Tutor (higher and further education) 2313Adviser (education)

6 Classification: Rules Abbreviations – Eg. deli = delicatessen Misspellings – Eg. taylor = tailor Thesaurus Alternatives – Eg. cook = (95%) chef Default values – Eg. BUSINESS MANAGER = company manager Non concluding text – Eg. Owner

7 Classification: Rules Downgraded Words – Eg. Trainee, Assistant, Senior Noise Words – Eg. and, of, with, in, at, the Noise Phrases – Eg My Mother is

8 Principals of operation Identify words. Select all codes where those words are used. Score all index entries in all those codes. Score comprised of – Global component – Record component 2 way comparison (Text-2-Index & Index-2-Text) Final Score (0-100) known as 'Certainty Score'

9 Complexities of Scoring Rules – Create alternatives. – Non concluding texts (in rules) => 39 Words are 'Pseudo Matched' before being searched for – Eg. miner matches mine, miner, mineral, minerals,mines Final score adjusted by next closest score

10 Automatic & Assisted Modes File Input/Output Threshold level = certainty score Assisted mode – score < threshold : user prompted Automatic mode – score < threshold : No code written

11 Performance Can be measured in many ways. – Speed, Throughput, Accuracy, Speed: Approx 1,000 texts / minute. Main Test Data – LFS 96/97 – Total Records : – Compared to manual coding.

12 Certainty Score Automatic text processing (SOC 2000): Throughput and error rates by certainty score

13 Certainty Score Automatic text processing (SIC 92) throughput and error rates by certainty score

14 % matching at each value of certainty score The relationship between matching at SOC2000 unit group level and the certainty score Certainty Score

15 Future Work Classification Editor – Editing of rules – Editing of structure, index entries – Creation of new classifications Performance enhancements – Integrated spell checker Integration of SOC & SIC coding (Output of SOC coding influenced by SIC code)

16 Cascot Demonstration

17 Cascot Website Cascot freely available over the web Desktop version (for high volume use) coming soon. Please register on the website.


Download ppt "The development of Cascot: Computer Aided Structured Coding Tool Rob Jones Institute for Employment Research University of Warwick."

Similar presentations


Ads by Google