Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.

Similar presentations


Presentation on theme: "Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research."— Presentation transcript:

1 Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research

2 Computer Assisted Structured Coding Tool CASCOT Software tool for coding text automatically or manually Developed at the Institute for Employment Research at Warwick University 1993- Used by over 100 organisations in the UK and abroad

3 IER contracted under the DASISH project to develop a multilingual version of CASCOT to code job titles to ISCO 08 A large task and limited resources, so this is a pilot project The 8 selected languages: - Dutch (Netherlands, Flemish-Belgium) - English - Finnish - French (France, Walloon-Belgium, Switzerland) - German (Germany, Austria, Switzerland) - Italian - Slovak - Spanish

4 Key Tasks Translating Cascot user interface texts Constructing national language versions of the ISCO 08 structure for Cascot Indexing job titles in the selected languages to ISCO 08 - Some supplied by NSIs or other partners - Some found by exploring relevant national websites Validating the software using raw data files from the European Social Survey (ESS) Round 6 Testing Cascot multilingual software Developing language-based coding rules Using Cascot Performance Tool to fine-tune the software

5 Coding with Cascot Enter text (could be from a file) Cascot provides a recommendation for code but user can change it Output can be directed to a file Selected classification

6 Multi-language Cascot 8 languages available: Dutch, English, Finnish, French, German, Italian, Slovak and Spanish Cascot detects language automatically but it can be changed from menu ISCO-08 classification exists for each country (some with national code)

7 Coding in Dutch

8 Finnish

9 French

10 German * * The index is © Federal Employment Agency

11 Italian

12 Slovak

13 Spanish

14 A test of multi-language Cascot Comparison of European Social Survey round 6 code and automatic Cascot code Data available from DE, ES, GB and NL ISCO-08

15 Cascot Performance Tool Allows the user to analyse the performance of Cascot by comparing manually coded data with code produced by Cascot for the same data. A delimited results file is needed that contains a reference code, Cascot code and Cascot score. The Tool shows Performance Results Display window with Performance Graph, Summary, Statistics and Key

16 Opening a results file

17 Performance Results Display The longer the green line stays high, the better The more towards right the purple/blue lines are, the better

18 The versions in different languages could be improved by developing coding rules Contribution needed from experts who know the language Rules are developed with Cascot Editor Fine-tuning multi-language Cascot

19 Cascot Editor Classification files for Cascot are created and modified with the Editor Each classification has Structure, Index, Rules for coding

20 Cascot Editor Rules Downgraded words: words that are considered to be significantly less important than other words, e.g. deputy, junior, person Equivalent word ends: wait|er, wait|ress Abbreviations: asst  assistant, fe  further education Replacement words: taylor  tailor, tesco  supermarket –Omitting noise words, e.g. replace ‘part-time’ with nothing Input modifications: used when the rule absolutely can not be made elsewhere Word alternatives: words and phrases that should also be tried as possible solution candidates Conclusions, retired  can not conclude, agent  ambiguous (score 39) Default coding: a set of words and phrases that should be scored as though they were a different word or phrase

21 Example of a new rule - English Add two new Replacement Words rules: The result: The problem:

22 Potential for rules - German German occupational titles were coded fully automatically with Cascot and the result was compared with an approved code. Above some examples where rules would improve Cascot coding performance. It is helpful to have “gold standard” files with a large number of real life job titles for which experts have assigned correct codes. Cascot coding result can be compared with “gold standard” to find areas for improvement.


Download ppt "Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research."

Similar presentations


Ads by Google