The Semantic Web and Language Technology BT Exact, Martlesham Hamish Cunningham Department of Computer Science, University of Sheffield Friday October.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

Dublin Core in Multiple Languages Thomas Baker Sixth Dublin Core Workshop Library of Congress, Washington DC Tuesday, 3 November 1998.
Chapter 1: The Database Environment
How to Author Teaching Files Draft Medical Imaging Resource Center.
Improving Human-Semantic Web Interaction: The Rhizomer Experience Roberto García and Rosa Gil GRIHO - Human Computer Interaction Research Group Universitat.
OMV Ontology Metadata Vocabulary April 10, 2008 Peter Haase.
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
1 Ontolog OOR Use Case Review Todd Schneider 1 April 2010 (v 1.2)
…to Ontology Repositories Mathieu dAquin Knowledge Media Institute, The Open University From…
Implementation of a Validated Statistical Computing Environment Presented by Jeff Schumack, Associate Director – Drug Development Information September.
Language Technologies Reality and Promise in AKT Yorick Wilks and Fabio Ciravegna Department of Computer Science, University of Sheffield.
© 1998, Progress Software Corporation 1 Migration of a 4GL and Relational Database to Unicode Tex Texin International Product Manager.
1 Copyright © 2005, Oracle. All rights reserved. Introducing the Java and Oracle Platforms.
1 OOA-HR Workshop, 11 October 2006 Semantic Metadata Extraction using GATE Diana Maynard Natural Language Processing Group University of Sheffield, UK.
Improving Learning Object Description Mechanisms to Support an Integrated Framework for Ubiquitous Learning Scenarios María Felisa Verdejo Carlos Celorrio.
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
Enterprise Java and Data Services Designing for Broadly Available Grid Data Access Services.
OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
1 ILT Induction Ashley Garner VLE Manager Ext 2802 IT Helpdesk: Ext 2780 E-Learning Helpdesk: Ext 2079.
Collection-level description & the Information Landscape: users evaluate strategies for resource discovery Collection Description Focus Workshop 5 Cambridge,
Technical and design issues in implementation Dr. Mohamed Ally Director and Professor Centre for Distance Education Athabasca University Canada New Zealand.
Configuration management
1 A Test Automation Tool For Java Applets Testing of Web Applications TATJA Program Demonstration Conclusions By Matthew Xuereb.
1/(20) Introduction to ANNIE Diana Maynard University of Sheffield March 2004
An Introduction to GATE
26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva.
Funded by: European Commission – 6th Framework Project Reference: IST WP6 review presentation GATE ontology QuestIO - Question-based Interface.
INTRODUCTION TO SIMULATION WITH OMNET++ José Daniel García Sánchez ARCOS Group – University Carlos III of Madrid.
University of Sheffield, NLP Case study: GATE in the NeOn project Diana Maynard University of Sheffield.
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
The World Wide Web. 2 The Web is an infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that.
GATE, Human Language and Machine Learning Hamish Cunningham, Valentin.
ArrayExpress Query Interface Gonzalo Garc í a Lara January, / 24.
1(18) GATE: A Unicode-based Infrastructure Supporting Multilingual Information Extraction Kalina Bontcheva, Diana Maynard, Valentin Tablan, Hamish Cunningham.
A Prototype Implementation of a Framework for Organising Virtual Exhibitions over the Web Ali Elbekai, Nick Rossiter School of Computing, Engineering and.
Database Administration
KEOD 2013 – 20 th September 2013 A Comprehensive Framework for Semantic Annotation of Web Content Manuel Fiorelli 1, Maria Teresa Pazienza 2, Armando Stellato.
ANNIC ANNotations In Context GATE Training Course 27 – 28 April 2006 Niraj Aswani.
1(21) HLT, Data Sparsity and Semantic Tagging Louise Guthrie (University of Sheffield) Roberto Basili (University of Tor Vergata, Rome) Hamish Cunningham.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Ontology-Aware Information Extraction Hamish Cunningham, Kalina Bontcheva Department of Computer Science, University of Sheffield OntoWeb.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
INTRODUCTION TO WEB DATABASE PROGRAMMING
M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,
GATE, a General Architecture for Text Engineering Hamish Cunningham, Kalina Bontcheva Department of Computer Science, University of Sheffield Wednesday.
GATE technical workshop: introduction Hamish Cunningham Sheffield, March.
Software Architecture for Language Engineering (SALE) – where next? Hamish.
GATE, a General Architecture for Text Engineering Hamish Cunningham Department.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Survey of Semantic Annotation Platforms
ANNIC ANNotations In Context GATE Training Course October 2006 Kalina Bontcheva (with help from Niraj Aswani)
GATE, a General Architecture for Text Engineering Hamish Cunningham, Kalina Bontcheva Department of Computer Science, University of.
Introduction to GATE Developer Ian Roberts. University of Sheffield NLP Overview The GATE component model (CREOLE) Documents, annotations and corpora.
Scenarios for a Learning GRID Online Educa Nov 30 – Dec 2, 2005, Berlin, Germany Nicola Capuano, Agathe Merceron, PierLuigi Ritrovato
©2003 Paula Matuszek Taken primarily from a presentation by Lin Lin. CSC 9010: Text Mining Applications.
ICCS 2008, CracowJune 23-25, Towards Large Scale Semantic Annotation Built on MapReduce Architecture Michal Laclavík, Martin Šeleng, Ladislav Hluchý.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
GATE, a General Architecture for Text Engineering Hamish Cunningham, Kalina Bontcheva, Valentin Tablan, Diana Maynard, Yorick Wilks.
A Unicode-based Environment for the Creation and use of LRs Valentin Tablan, Cristian Ursu, Kalina Bontcheva, Hamish Cunningham, Diana Maynard, Oana Hamza,
Copyright © 2002 Pearson Education, Inc. Slide 3-1 Internet II A consortium of more than 180 universities, government agencies, and private businesses.
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
A Ubiquitous Permeable Web: requirements for the next generation semantic internet Hamish Cunningham Department of Computer Science, University of Sheffield.
Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content Kalina Bontcheva, Diana Maynard, Hamish Cunningham, Horacio.
Chapter 1 The Nature of Software
Institute of Informatics & Telecommunications NCSR “Demokritos”
GATE and the Semantic Web
Chapter 1 The Nature of Software
Chapter 1 The Nature of Software
Design and Maintenance of Web Applications in J2EE
Presentation transcript:

The Semantic Web and Language Technology BT Exact, Martlesham Hamish Cunningham Department of Computer Science, University of Sheffield Friday October 11 th 2002 Next generation web GATE, language technology infrastructure 1(19)

A Ubiquitous Permeable Web The next generation of the web must be: ubiquitous: semantics for every device, every organisation, every individual; permeable: allow contextual data to penetrate and persist; companionable: able to engage with us via multiple natural modalities. Roles for Language Technology: discovery of semantics (ubiquity); mediating between context and personal semantic memories (permeability); conversing with people and the semantic web (companionableness). 2(19)

Critical Mass for the Semantic Web The SW: machine processable, repurposable data to compliment hypertext But: semantics = % of the Web How to achieve critical mass? Huge scale automatic annotation. Requirements: Huge scale: – freely available to all EU citizens – distributed (over a Grid) – re-purposeable (delivered as Web Services) Portability and robustness via: – simple and therefore shallow HLT methods – +ve and –ve learning – analogs of IPSEs for computer-literate users 3 (19)

Motivation for Software Infrastructure for Language Engineering Need for scalable, reusable, and portable HLT solutions Support for large data, in multiple media, languages, formats, and locations Lowering the cost of creation of new language processing components Promoting quantitative evaluation metrics via tools and a level playing field 4 (19)

5 (19) Motivation (II):

GATE, a General Architecture for Text Engineering An architecture A macro-level organisational picture for LE software systems. A framework For programmers, GATE is an object-oriented class library that implements the architecture. A development environment For language engineers, computational linguists et al, GATE is a graphical development environment bundled with a set of tools for doing e.g. Information Extraction. Some free components......and wrappers for other people's components Tools for: evaluation; visualise/edit; persistence; IR; IE; dialogue; ontologies; etc. Free software (LGPL). Download at 6 (19)

Architectural principles Non-prescriptive, theory neutral (strength and weakness) Re-use, interoperation, not reimplementation (e.g. diverse XML support, integration of tools like Protégé, Jena and Weka) (Almost) everything is a component, and component sets are user-extendable Component-based development An OO way of chunking software: Java Beans GATE components: CREOLE = modified Java Beans (Collection of REusable Objects for Language Engineering) The minimal component = 10 lines of Java, 10 lines of XML, 1 URL. 7 (19)

GATE Language Resources GATE LRs are documents, ontologies, corpora, lexicons, …… Documents / corpora: GATE documents loaded from local files or the web... Diverse document formats: text, html, XML, , RTF, SGML. Processing Resourcres Algorithmic components knows as PRs – beans with execute methods. All PRs can handle Unicode data by default. Clear distinction between code and data (simple repurposing) freebies with GATE e.g. Named entity recognition; WordNet; Protégé; Ontology; OntoGazetteer; DAML+OIL export; Information Retrieval based on Lucene 8 (19)

Relational Database … GATE Format Handlers HTML docs RTF docs XML docs Named entity Core- ference … ANNIE POS tagger Named entity Event extraction … Custom application 1 … Document content Document metadata Document format data Linguistic data File storage … Oracle/ PostgresQL A Language Analysis Example

10(11)

Building IE Components in GATE (1) The ANNIE system – a reusable and easily extendable set of components 11 (19)

Building IE Components in GATE (2) JAPE: a Java Annotation Patterns Engine Light, robust regular-expression-based processing Cascaded finite state transduction Low-overhead development of new components Rule: Company1 Priority: 25 ( ( {Token.orthography == upperInitial} )+ {Lookup.kind == companyDesignator} ):companyMatch --> :companyMatch.NamedEntity = { kind = company, rule = “Company1” } 12 (19)

GATE is being used for development of (semi-)automatic methods for: linking web pages to Ontologies using Information Extraction; learning and evolving Ontologies via IE and lexical semantic network traversal. The Semantic Web and GATE 13 (19)

Populating Ontologies with IE

Protégé and Ontology Management

Information Retrieval Support Based on the Lucene IR engine 16 (19)

Displaying Multilingual Data All the visualisation and editing tools for ML LRs use enhanced Java facilities: 17 (19)

Applications GATE has been used for a variety of applications, including: MUMIS: automatic creation of semantic indexes for multimedia programme material MUSE: a multi-genre IE system Metadata for Medline (at Merck) ACE: participation in the Automatic Content Extraction programme HSE: summarisation of health and safety information from company reports OldBaileyIE: NE recognition on 17th century Old Bailey Court reports. Various Medical Informatics and database technology projects IE in Romanian, Bulgarian, Greek, Bengali, Spanish, Swedish, German, Italian, and French (Arabic, Chinese and Russian this autumn) 18 (19)

Conclusion GATE: an infrastructure that lowers the overhead of creating & embedding robust NLP components Further information: Online demos, tutorials and documentation Software downloads Talks and papers 19 (19)