UCREL: from LOB to REVERE Paul Rayson. November 1999CSEG awayday Paul Rayson2 A brief history of UCREL In ten minutes, I will present a brief history.

Slides:



Advertisements
Similar presentations
1 OOA-HR Workshop, 11 October 2006 Semantic Metadata Extraction using GATE Diana Maynard Natural Language Processing Group University of Sheffield, UK.
Advertisements

Diachronic study and language change Corpus Linguistics Richard Xiao
Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Introduction to Computational Linguistics
Computational Paradigms in the Humanities – eHumanities and their role and impact in transdisciplinary research Gerhard Budin University of Vienna.
Uses of a Corpus “[E]xplore actual patterns of language use”
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
The REVERE project: experiments with the application of probabilistic NLP to systems engineering Paul Rayson 1, Luke Emmet 2, Roger Garside 1 and Pete.
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
Advanced AI - Part II Luc De Raedt University of Freiburg WS 2004/2005 Many slides taken from Helmut Schmid.
The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
1/7 INFO60021 Natural Language Processing Harold Somers Professor of Language Engineering.
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
LELA English Corpus Linguistics
Växjö University Joakim Nivre Växjö University. 2 Who? Växjö University (800) School of Mathematics and Systems Engineering (120) Computer Science division.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Statistical Natural Language Processing Advanced AI - Part II Luc De Raedt University of Freiburg WS 2005/2006 Many slides taken from Helmut Schmid.
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English.
Artificial Intelligence. Agenda StartEnd Introduction AI Future Recent Developments Turing Test Turing Test Evaluation.
Albert Gatt Corpora and Statistical Methods Lecture 9.
ELN – Natural Language Processing Giuseppe Attardi
CAREERS IN LINGUISTICS OUTSIDE OF ACADEMIA CAREERS IN INDUSTRY.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
BTANT 129 w5 Introduction to corpus linguistics. BTANT 129 w5 Corpus The old school concept – A collection of texts especially if complete and self-contained:
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Researching language with computers Paul Thompson.
Isalin Translate eWika: Towards the Digitalization of Philippine Languages Charibeth K. Cheng DLSU, College of Computer Studies Natural.
Suléne Pilon & Danie Prinsloo Overview: Teaching and Training in South Africa 25 November 2008;
REVERE Recovering Legacy Requirements an EPSRC-SEBPC project.
Comparing Corpora using Frequency Profiling Paul Rayson and Roger Garside UCREL research group Computing Department Lancaster University, UK.
Chapter 10 Language and Computer English Linguistics: An Introduction.
1 Using Corpora in Language Research -also Introduction to the Sketch Engine (WS15) part 1 Adam Kilgarriff Lexical Computing Ltd Universities of Leeds.
Language Technology I © 2005 Hans Uszkoreit Language Technology I 2005/06 Hans Uszkoreit Universität des Saarlandes and German Research Center for Artificial.
Virach Sornlertlamvanich Information R&D Division (iTech) National Electronics and Computer Technology Center (NECTEC) THAILAND 19 January 2001 Symposium.
Using Corpora in Language Research Adam Kilgarriff Lexical Computing Ltd Universities of Leeds January 2013Adam Kilgarriff.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
Introduction to Human Language Technologies Tomaž Erjavec Karl-Franzens-Universität Graz Tomaž Erjavec Lecture 1: Overview
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
Computational Linguistics. The Subject Computational Linguistics is a branch of linguistics that concerns with the statistical and rule-based natural.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
Chapter One What is language? What is it we know about language?
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Introduction Chapter 1 Foundations of statistical natural language processing.
Enda F. Scott 2001 Good morning An introduction to modern dictionary making.
Overview of Corpus Linguistics
1 An Introduction to Computational Linguistics Mohammad Bahrani.
Part-of-Speech Tagging with Limited Training Corpora Robert Staubs Period 1.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
CORPUS LINGUISTICS 1) A revision of corpus linguistics 2) Language corpora in the ESL/EFL classroom.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
Course Projects Speech Recognition Spring 1386
Intro to corpus linguistics: Data Driven Grammar
Machine Learning in Natural Language Processing
Writing Analytics Clayton Clemens Vive Kumar.
Introduction to Machine Translation
Computational Linguistics: New Vistas
Natural Language Processing
Applied Linguistics Chapter Four: Corpus Linguistics
Linguistic Universals
Presentation transcript:

UCREL: from LOB to REVERE Paul Rayson

November 1999CSEG awayday Paul Rayson2 A brief history of UCREL In ten minutes, I will present a brief history of UCREL (the University Centre for Computer Corpus Research on Language) which has members in the Computing and Linguistics Departments. UCREL specialises in the automatic or computer- aided analysis of large bodies of naturally-occurring language (`corpora'). UCREL has a record of achievement of more than twenty years as pioneers in this field. From the first million-word corpus of British English in 1978 (LOB) up to the present day with the REVERE project which is piloting the application of UCREL's techniques to texts in the requirements engineering domain. 1960THE TIMELINE1999

November 1999CSEG awayday Paul Rayson3 Noam Chomsky Chomsky changed the direction of Linguistics away from empiricism and towards rationalism observation of naturally occurring data versus theory of how human language processing is actually undertaken model competence rather than performance early corpus linguists saw language as finite and collected it all!

November 1999CSEG awayday Paul Rayson4 Brown and LOB One million word machine-readable corpora: –Brown corpus (American English) –Lancaster-Oslo-Bergen (British English)

November 1999CSEG awayday Paul Rayson5 Grammatical analysis Statistical word-class tagging by CLAWS, 98% accuracy, simple rule-based systems only achieved low 90% Manual full parse - 3 million words for training

November 1999CSEG awayday Paul Rayson6 Speech recognition In collaboration with IBM’s continuous speech recognition group Produced a detailed analysis of spoken data and a grammar of spoken English

November 1999CSEG awayday Paul Rayson7 Word sense tagging Automatic tagger to assign semantic tags Tags differentiate dictionary word senses to accuracy of 91% Full text, hybrid rule-based & statistical Application to market research

November 1999CSEG awayday Paul Rayson8 The next generation British National Corpus One hundred million words All tagged at Lancaster

November 1999CSEG awayday Paul Rayson9 REVERE Application of UCREL’s techniques to requirements engineering domain Legacy documents (specifications, ethnographic reports, manuals) Assisting the RE to extract domain knowledge

November 1999CSEG awayday Paul Rayson10 UCREL consultancy CLAWS licences Tagging services –146 million words processed