Presentation on theme: "The development of Cascot: Computer Aided Structured Coding Tool"— Presentation transcript:
1 The development of Cascot: Computer Aided Structured Coding Tool Rob JonesInstitute for Employment ResearchUniversity of WarwickIntroductionBrief HistoryHow & When I became involved.(Started last sept.)
2 Project History Two programs Development Undertaken Casoc (SOC 2000 coding)Casic (SIC 92 coding)Undocumented, monolithic, legacy dos code.Development UndertakenTesting frameworkModularisation (Object Orientation)Integration of Casoc and Casic into one.New GUI
3 Current Status Single program, “Cascot” Single scoring engine Capable of SOC 2000 & SIC 92CodingAny Classification.Single scoring engineLoadable classificationsStructureIndexRulesOptional interfaces: web page & desktop application.
4 Classification: Structure Nature of the classificationExample. SOC 2000: 4 levels, Code & Title1 Managers and Senior Officials11 Corporate Managers111 Corporate Managers And Senior Officials1111 Senior officials in national government1112 Directors and chief executives of major organisations1113 Senior officials in local government1114 Senior officials of special interest organisations112 Production Managers1121 Production, works and maintenance managers
5 Classification: Index Series of texts associated with given codes.2312 Teacher (educational establishments: college of education)2312 Teacher (further education)2312 Teacher (higher and further education)2312 Teacher (tertiary college)2312 Teacher, dance (further education)2312 Teacher, music (further education)2312 Teacher; head (educational establishments: further education2312 Tutor (further education)2312 Tutor (higher and further education)2313 Adviser (education)
6 Classification: Rules AbbreviationsEg. deli = delicatessenMisspellingsEg. taylor = tailorThesaurus AlternativesEg. cook = (95%) chefDefault valuesEg. BUSINESS MANAGER = company managerNon concluding textEg. Owner
8 Principals of operation Identify words.Select all codes where those words are used.Score all index entries in all those codes.Score comprised ofGlobal componentRecord component2 way comparison (Text-2-Index & Index-2-Text)Final Score (0-100) known as 'Certainty Score'This is a simplified overview of the coding
9 Complexities of Scoring RulesCreate alternatives.Non concluding texts (in rules) => 39Words are 'Pseudo Matched' before being searched forEg. miner matches mine, miner, mineral, minerals,minesFinal score adjusted by next closest scoreThe coding is infact a lot more complex that the previous overview might lead you to beleive.Some of the complexities include
10 Automatic & Assisted Modes File Input/OutputThreshold level = certainty scoreAssisted modescore < threshold : user promptedAutomatic modescore < threshold : No code writtenIn addition to coding individual texts it is possible to run Cascot so as to block code files containing 1,000s of texts.This can be done in two modes: .....The difference is Automatic will do everything..... Assisted will prompt when appropriate.
11 Performance Can be measured in many ways. Speed, Throughput, Accuracy,Speed: Approx 1,000 texts / minute.Main Test DataLFS 96/97Total Records : 63251Compared to manual coding.
12 Automatic text processing (SOC 2000): Throughput and error rates by certainty score
13 Automatic text processing (SIC 92) throughput and error rates by certainty score
14 The relationship between matching at SOC2000 unit group level and the certainty score % matching at each value of certainty scoreCertainty Score
15 Future Work Classification Editor Performance enhancements Editing of rulesEditing of structure, index entriesCreation of new classificationsPerformance enhancementsIntegrated spell checkerIntegration of SOC & SIC coding(Output of SOC coding influenced by SIC code)
16 Cascot Demonstration Highlight sections of GUI show change SOC/SIC enter some text,results, highlightshow load DLHE
17 Cascot Website Cascot freely available over the web Desktop version(for high volume use)coming soon.Please register on the website.