Ontology-Aware Information Extraction Hamish Cunningham, Kalina Bontcheva Department of Computer Science, University of Sheffield OntoWeb.

Slides:



Advertisements
Similar presentations
Human Language Technologies for the Semantic Web Department of Computer Science, University of Sheffield Fabio Ciravegna and Yorick Wilks.
Advertisements

1 OOA-HR Workshop, 11 October 2006 Semantic Metadata Extraction using GATE Diana Maynard Natural Language Processing Group University of Sheffield, UK.
An Introduction to GATE
GATE, Human Language and Machine Learning Hamish Cunningham, Valentin.
1(18) GATE: A Unicode-based Infrastructure Supporting Multilingual Information Extraction Kalina Bontcheva, Diana Maynard, Valentin Tablan, Hamish Cunningham.
The Semantic Web and Language Technology BT Exact, Martlesham Hamish Cunningham Department of Computer Science, University of Sheffield Friday October.
Where the Web Went Wrong Hamish Cunningham Dept. Computer Science, University.
1() Multi-Source and MultiLingual Information Extraction Diana Maynard Natural Language Processing Group University of Sheffield, UK BCS-SIGAI Workshop,
1(21) HLT, Data Sparsity and Semantic Tagging Louise Guthrie (University of Sheffield) Roberto Basili (University of Tor Vergata, Rome) Hamish Cunningham.
OntoBlog: Informal Knowledge Management by Semantic Blogging Aman Shakya 1, Vilas Wuwongse 2, Hideaki Takeda 1, Ikki Ohmukai 1 1 National Institute of.
Mining the web to improve semantic-based multimedia search and digital libraries
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
AceMedia Personal content management in a mobile environment Jonathan Teh Motorola Labs.
© Anselm SpoerriInfo + Web Tech Course Information Technologies Info + Web Tech Course Anselm Spoerri PhD (MIT) Rutgers University
CS 290C: Formal Models for Web Software Lecture 6: Model Driven Development for Web Software with WebML Instructor: Tevfik Bultan.
Content Management Systems Digital Resources for Research in the Humanities 2001.
Supplement 02CASE Tools1 Supplement 02 - Case Tools And Franchise Colleges By MANSHA NAWAZ.
Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding.
Semantic Web for E-Science and Education Enrico Motta Knowledge Media Institute The Open University, UK.
Text mining and the Semantic Web Dr Diana Maynard NLP Group Department of Computer Science University of Sheffield.
After OWL: defacto standards for semantic technologies (or: what do you get for €40m EU research money?)
Ontology-based Information Extraction for Business Intelligence
Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.
Analysing Crime-Scene Reports Katerina Pastra and Horacio Saggion University of Sheffield Scene of Crime Information System.
A Light-weight Approach to Coreference Resolution for Named Entities in Text Marin Dimitrov Ontotext Lab, Sirma AI Kalina Bontcheva, Hamish Cunningham,
Digital Library Architecture and Technology
Named Entity Recognition without Training Data on a Language you don’t speak Diana Maynard Valentin Tablan Hamish Cunningham NLP group, University of Sheffield,
What’s the difference between Tony Blair and Mother Theresa? (Human Language Technology for Preservation return on investment)
GATE, a General Architecture for Text Engineering Hamish Cunningham, Kalina Bontcheva Department of Computer Science, University of Sheffield Wednesday.
GATE technical workshop: introduction Hamish Cunningham Sheffield, March.
Software Architecture for Language Engineering (SALE) – where next? Hamish.
GATE, a General Architecture for Text Engineering Hamish Cunningham Department.
Piero Attanasio mEDRA: the European DOI agency The DOI as a tool for interoperability between private and public sector Athens, 14 January.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
1 Building Semantic Applications Paul Warren
EXCS Sept Knowledge Engineering Meets Software Engineering Hele-Mai Haav Institute of Cybernetics at TUT Software department.
Language Technology for the Semantic Web OntoWeb5,Florida,October 17 th,2003 WP12: Language Technology Overview SIG5 Paul Buitelaar.
Survey of Semantic Annotation Platforms
JRC-Ispra, , Slide 1 Next Steps / Technical Details Bruno Pouliquen & Ralf Steinberger Addressing the Language Barrier Problem in the Enlarged.
Provenance Metadata for Shared Product Model Databases Etiel Petrinja, Vlado Stankovski & Žiga Turk University of Ljubljana Faculty of Civil and Geodetic.
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
GATE: an AKT success story [GATE: open source language technology component architecture and many tools, with a number of AKT roles]
1 Technologies for (semi-) automatic metadata creation Diana Maynard.
1/(13) Using Corpora and Evaluation Tools Diana Maynard Kalina Bontcheva
Pastra et al., LREC 2002 How feasible is the reuse of grammars for Named Entity Recognition? Katerina Pastra, Diana Maynard, Oana Hamza, Hamish Cunningham.
EU Project proposal. Andrei S. Lopatenko 1 EU Project Proposal CERIF-SW Andrei S. Lopatenko Vienna University of Technology
19/10/20151 Semantic WEB Scientific Data Integration Vladimir Serebryakov Computing Centre of the Russian Academy of Science Proposal: SkTech.RC/IT/Madnick.
Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST Kick-off.
GATE, a General Architecture for Text Engineering Hamish Cunningham, Kalina Bontcheva Department of Computer Science, University of.
MTA SZTAKI Department of Distributed Systems The problems of persistent identifiers in the context of the National Digital Data Archives of Hungary András.
@ 2008 Copyright NIC I Do not distribute without permission E-Services for Transforming to the Next Generation Government “A Case Study of India” Suchitra.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
1 Language Technologies (1) Diana Maynard University of Sheffield, UK ACAI 05 ADVANCED COURSE ON KNOWLEDGE DISCOVERY.
Scenarios for a Learning GRID Online Educa Nov 30 – Dec 2, 2005, Berlin, Germany Nicola Capuano, Agathe Merceron, PierLuigi Ritrovato
Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.
The Humanities in a Global e-Infrastructure A Shopping-List Gregory Crane, Perseus Project, Tufts Brian Fuchs, Internet Centre, Imperial College Dolores.
©2003 Paula Matuszek Taken primarily from a presentation by Lin Lin. CSC 9010: Text Mining Applications.
Workshop on Human Language Technology for the Semantic Web and Web Services 2nd International Semantic Web Conference October 20th 2003, Sanibel Island,
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
CSC 9010 Spring, Paula Matuszek. 1 CS 9010: Semantic Web Applications and Ontology Engineering Paula Matuszek Spring, 2006.
GATE, a General Architecture for Text Engineering Hamish Cunningham, Kalina Bontcheva, Valentin Tablan, Diana Maynard, Yorick Wilks.
A Unicode-based Environment for the Creation and use of LRs Valentin Tablan, Cristian Ursu, Kalina Bontcheva, Hamish Cunningham, Diana Maynard, Oana Hamza,
Jens Hartmann York Sure Raphael Volz Rudi Studer The OntoWeb Portal.
Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content Kalina Bontcheva, Diana Maynard, Hamish Cunningham, Horacio.
TextCrowd – Collaborative semantic enrichment of text-based datasets
Institute of Informatics & Telecommunications NCSR “Demokritos”
GATE and the Semantic Web
Joseph JaJa, Mike Smorul, and Sangchul Song
Dr Kristin Stock Allworlds Geothinking
Presentation transcript:

Ontology-Aware Information Extraction Hamish Cunningham, Kalina Bontcheva Department of Computer Science, University of Sheffield OntoWeb 4, SIG 5, 2002

2(12) GATE, a General Architecture for Text Engineering GATE is…. An architecture A macro-level organisational picture for LE software systems. A framework For programmers, GATE is an object-oriented class library that implements the architecture. A development environment For language engineers, computational linguists et al, GATE is a graphical development environment bundled with a set of tools for doing e.g. Information Extraction. Free software (LGPL). Mature robust software (in development since 1995). Download at Comes with… Some free components......and wrappers for other people's components Tools for: evaluation; visualise/edit; persistence; IR; IE; dialogue; ontologies; etc.

3(12) Applications; languages GATE has been used for a variety of applications, including: MUMIS: automatic creation of semantic indexes for multimedia programme material MUSE: a multi-genre IE system EMILLE: a 70 million word corpus of Indic languages Metadata for Medline (at Merck) Creation of metadata for Semantic Web Services; documentation using NLG HSE: summarisation of health and safety information from company reports OldBaileyIE: NE recognition on 17th century Old Bailey Court reports. AKT: language technology in knowledge management AMITIES: call centre automation Digital libraries / e-philology for ancient languages researchers Various Medical Informatics and database technology projects IE in Romanian, Bulgarian, Greek, Bengali, Spanish, Swedish, German, Italian, and French (Arabic, Chinese and Russian next year)

4(12) Some users… At time of writing a representative fraction of GATE users includes: Longman Pearson publishing, UK; BT Exact Technologies, UK; Merck KgAa, Germany; Canon Europe, UK; Knight Ridder (the second biggest US news publisher); BBN Technologies, US; Sirma AI Ltd., Bulgaria; Resco AB, Sweden/Finland/Germany; Glaxo Smith Kline Plc: drug-based navigation of Medline abstracts Master Foods NV: extraction of commodities events from news the American National Corpus project, US; Imperial College, London, the University of Manchester, Queen Mary College, UMIST, the University of Karlsruhe, Vassar College, ISI / the University of Southern California and a large number of other UK, US and EU Universities; the Perseus Digital Library project, Tufts University, US.

5(12) Scientific method and HLT How do we really know that this stuff works?! Open source systems make experimental repeatability easier and therefore cut down on site- specific skew effects. GATE's IE tools have competed in MUC, TREC (QA), ACE, and DUC. TIDES Surprise Language exercise next year. GATE includes markup and automated evaluation tools: easier quantitative evaluation.

6(12) Collaboration opportunities Interoperation, integration, not re-invention: collaboration not competition Take the code, do what you like with it, perhaps contribute something back Involve us in your 6th Framework projects Join KITShare: a network of excellence in Knowledge and Interface Tool Sharing.

7(12) The Holy Grail Problem: gap between many current IE tools and SemWeb needs

8(12) What is needed? Content, not Information Extraction –Identify the ontological reference, not just the class –Maintain referential integrity (coreference) Ontology-aware IE tools –Use instances already in the ontology –React to changes in the ontology Support experienced users to change the IE tools

9(12) GATE and Content Extraction ANNIE - Open-source IE system in GATE, providing modules needed for content extraction Pre-processing Named entity recognition Coreference resolution –ANNIE handles proper names, pronouns, and nominals Easy-to-use pattern-action rule language to enable customisation and postprocessing of the IE results

10(12) Populating Ontologies with ANNIE

11(12) Ontologies as explicit IE resources Reuse, not reinvention: –Protégé for ontology maintenance –Sesame/KAON for storage and reasoning Ontology-aware gazetteers –Provide the ontological class of each entry –Use instances from the ontology for IE

12(12) Ontology-aware IE The IE tools can use available formal knowledge and reasoning Ontology-based anaphora resolution –G. Bush, G. Brown, the president The correct ontological classes are assigned to the recognised entities Changes in the ontology available to the IE tools