Presentation is loading. Please wait.

Presentation is loading. Please wait.

GATE technical workshop: introduction Hamish Cunningham Sheffield, March.

Similar presentations


Presentation on theme: "GATE technical workshop: introduction Hamish Cunningham Sheffield, March."— Presentation transcript:

1 GATE technical workshop: introduction http://gate.ac.uk/http://gate.ac.uk/ http://nlp.shef.ac.uk/http://nlp.shef.ac.uk/ Hamish Cunningham Sheffield, March 17/18, 2004

2 2(9) Wednesday (G22) 10.15: arrival, setup 10.30: introductions, summary of background / skills 10.40: mission, conventions, internal pages, GATE intro (hc) 11.30: tools: cvs, jbuilder, tkdiff, building GATE (vt) 12.00: break 12.15: intro to the GUI (dm) 1.30: lunch 2.30: annie, jape (dm) 4.00: break 4.15: summary of projects (hc) 5.30: close Agenda Thursday (G30) 10.30: API, CREOLE lifecycle, java for jape [1] (vt) 12.00: break 12.15: tests, writing, running; API etc. [2] (hc, vt) 1.30: lunch 2.30: corpora, evaluation tools (dm, kb) 3.00: machine learning (vt) 4.00: break 4.15: ontologies (kb) 5.15: wrapup 5.30: close

3 3(9) mission conventions mailing lists roles and responsibilities Blah

4 4(9) GATE (the Volkswagen Beetle of Language Processing) is: Eight years old (!), with 000s of users at 00s of sites An architecture A macro-level organisational picture for LE software systems. A framework For programmers, GATE is an object-oriented class library that implements the architecture. A development environment For language engineers, computational linguists et al, a graphical development environment. Some free components......and wrappers for other people's components Tools for: evaluation; visualise/edit; persistence; IR; IE; dialogue; ontologies; etc. Free software (LGPL). Download at http://gate.ac.uk/download/ http://gate.ac.uk/download/

5 5(9) A bit of a nuisance (our users) GATE team projects. Past: Conceptual indexing: MUMIS: automatic semantic indices for sports video MUSE, cross-genre entitiy finder HSL, Health-and-safety IE Old Bailey: collaboration with HRI on 17th century court reports Multiflora: plant taxonomy text analysis for biodiversity research e- science EMILLE: S. Asian language corpus ACE / TIDES: Arabic, Chinese NE JHU summer w/s on semtagging Present: Advanced Knowledge Technologies: €12m UK five site collaborative project ETCSL: Sumerian digital library MiAKT: medical informatics / AKT SEKT: Semantic Knowledge Tech PrestoSpace: AV Preservation KnowledgeWeb; h-TechSight Thousands of users at hundreds of sites. A representative sample: the American National Corpus project the Perseus Digital Library project, Tufts University, US Longman Pearson publishing, UK Merck KgAa, Germany Canon Europe, UK Knight Ridder, US BBN (leading HLT research lab), US SMEs: Melandra, SG-MediaStyle,... Imperial College, London, the University of Manchester, UMIST, the University of Karlsruhe, Vassar College, the University of Southern California and a large number of other UK, US and EU Universities UK and EU projects inc. MyGrid, CLEF, dotkom, AMITIES, CubReporter, Poesia...

6 6(9) Architectural principles Non-prescriptive, theory neutral (strength and weakness) Re-use, interoperation, not reimplementation (e.g. diverse XML support, integration of Protégé, Jena, Weka...) (Almost) everything is a component, and component sets are user-extendable (Almost) all operations are available both from API and GUI

7 7(9) All the world’s a Java Bean.... CREOLE: a Collection of REusable Objects for Language Engineering: GATE components: modified Java Beans with XML configuration The minimal component = 10 lines of Java, 10 lines of XML, 1 URL Why bother? Allows the system to load arbitrary language processing components

8 8(9) NOTES everything is a replaceable bean all communication via fixed APIs low coupling, high modularity, high extensibility … HTML docs RTF docs XML docs PDF docs email XML Document Format HTML Document Format PDF Document Format … Document Format Layer (LRs) XML Oracle Postgre Sql.ser DataStore Layer Corpus Document Document Content Annotation Set Annotation Feature Map Corpus Layer (LRs) NOTES (2) eg: Protégé LR & VR both wrapped in Res. (bean) API ontology repositories and inference should be the same: KAON + Sesame + Orenge + ? GATE APIs Processing Layer (PRs) NE Co-ref TEs TRs POS … Onto- logy Protégé Onto- logy Word- net Gaz- etteers Language Resource Layer (LRs)... Application Layer ANNIE OBIE … IDE GUI Layer (VRs) ADiff OntolVR DocVR...

9 9(9) Happy Birthday Valy!


Download ppt "GATE technical workshop: introduction Hamish Cunningham Sheffield, March."

Similar presentations


Ads by Google