Presentation is loading. Please wait.

Presentation is loading. Please wait.

UCREL: from LOB to REVERE Paul Rayson. November 1999CSEG awayday Paul Rayson2 A brief history of UCREL In ten minutes, I will present a brief history.

Similar presentations


Presentation on theme: "UCREL: from LOB to REVERE Paul Rayson. November 1999CSEG awayday Paul Rayson2 A brief history of UCREL In ten minutes, I will present a brief history."— Presentation transcript:

1 UCREL: from LOB to REVERE Paul Rayson

2 November 1999CSEG awayday Paul Rayson2 A brief history of UCREL In ten minutes, I will present a brief history of UCREL (the University Centre for Computer Corpus Research on Language) which has members in the Computing and Linguistics Departments. UCREL specialises in the automatic or computer- aided analysis of large bodies of naturally-occurring language (`corpora'). UCREL has a record of achievement of more than twenty years as pioneers in this field. From the first million-word corpus of British English in 1978 (LOB) up to the present day with the REVERE project which is piloting the application of UCREL's techniques to texts in the requirements engineering domain. 1960THE TIMELINE1999

3 November 1999CSEG awayday Paul Rayson3 Noam Chomsky Chomsky changed the direction of Linguistics away from empiricism and towards rationalism observation of naturally occurring data versus theory of how human language processing is actually undertaken model competence rather than performance early corpus linguists saw language as finite and collected it all!

4 November 1999CSEG awayday Paul Rayson4 Brown and LOB One million word machine-readable corpora: –Brown corpus (American English) –Lancaster-Oslo-Bergen (British English)

5 November 1999CSEG awayday Paul Rayson5 Grammatical analysis Statistical word-class tagging by CLAWS, 98% accuracy, simple rule-based systems only achieved low 90% Manual full parse - 3 million words for training

6 November 1999CSEG awayday Paul Rayson6 Speech recognition In collaboration with IBM’s continuous speech recognition group Produced a detailed analysis of spoken data and a grammar of spoken English

7 November 1999CSEG awayday Paul Rayson7 Word sense tagging Automatic tagger to assign semantic tags Tags differentiate dictionary word senses to accuracy of 91% Full text, hybrid rule-based & statistical Application to market research

8 November 1999CSEG awayday Paul Rayson8 The next generation British National Corpus One hundred million words All tagged at Lancaster

9 November 1999CSEG awayday Paul Rayson9 REVERE Application of UCREL’s techniques to requirements engineering domain Legacy documents (specifications, ethnographic reports, manuals) Assisting the RE to extract domain knowledge

10 November 1999CSEG awayday Paul Rayson10 UCREL consultancy CLAWS licences Tagging services –146 million words processed


Download ppt "UCREL: from LOB to REVERE Paul Rayson. November 1999CSEG awayday Paul Rayson2 A brief history of UCREL In ten minutes, I will present a brief history."

Similar presentations


Ads by Google