Semantic Annotation of Grey Literature from an Archaeological Digital Library Andreas Vlachidis, Doug Tudhope Hypermedia Research Unit University of Glamorgan.

Slides:

Advertisements

Similar presentations

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.

Advertisements

Resource description and access for the digital world Gordon Dunsire Centre for Digital Library Research University of Strathclyde Scotland.

1 OOA-HR Workshop, 11 October 2006 Semantic Metadata Extraction using GATE Diana Maynard Natural Language Processing Group University of Sheffield, UK.

Alexandria Digital Library Project Integration of Knowledge Organization Systems into Digital Library Architectures Linda Hill, Olha Buchel, Greg Janée.

Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop November 2007.

An Introduction to GATE

University of Sheffield, NLP Case study: GATE in the NeOn project Diana Maynard University of Sheffield.

Dr. Bruce A. Scharlau, AHDIT, August 2002 AHDIT: Ad Hoc Data Interoperability Tool Dr. Bruce A. Scharlau Dept. of Computing Science University of Aberdeen.

Archaeology and Terminology Ceri Binding Hypermedia Research Unit, University of Glamorgan, Wales, UK

An Operational Metadata Framework For Searching, Indexing, and Retrieving Distributed GIServices on the Internet By Ming-Hsiang.

STELLAR Introduction Ceri Binding, Douglas Tudhope Hypermedia Research Unit, University of Glamorgan.

Semantic Annotations in the Archaeological Domain Andreas Vlachidis, Ceri Binding, Keith May, Douglas TudhopeSTAR STAR Semantic Technologies for Archaeological.

Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.

© University of South Wales Classical Art Semantics Information Extraction: CASIE Pilot Project Dr. Andreas Vlachidis Hypermedia Research Unit University.

STELLAR Introduction Douglas Tudhope Hypermedia Research Unit, University of Glamorgan.

Applications Chapter 9, Cimiano Ontology Learning Textbook Presented by Aaron Stewart.

Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.

A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,

Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.

Detecting Economic Events Using a Semantics-Based Pipeline 22nd International Conference on Database and Expert Systems Applications (DEXA 2011) September.

Semantic Mediation & OWS 8 Glenn Guempel

Digging Up Data: The Archaeotools project, Faceted Classification and Natural Language Processing in an archaeological context. Stuart Jeffrey, Julian.

MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:

Erasmus University Rotterdam Introduction Nowadays, emerging news on economic events such as acquisitions has a substantial impact on the financial markets.

Logic Programming for Natural Language Processing Menyoung Lee TJHSST Computer Systems Lab Mentor: Matt Parker Analytic Services, Inc.

PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,

KOS-based tools for archaeological dataset interoperability: NKOS Workshop, ECDL 2010 C. Binding, K. May 1, D. Tudhope, A. Vlachidis Hypermedia Research.

WP5.4 - Introduction  Knowledge Extraction from Complementary Sources  This activity is concerned with augmenting the semantic multimedia metadata basis.

Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.

Survey of Semantic Annotation Platforms

Knowledge Organization Systems and Information Discovery Douglas Tudhope Inaugural Lecture.

Information Extraction From Medical Records by Alexander Barsky.

FIIT STU Bratislava Classification and automatic concept map creation in eLearning environment Karol Furdík 1, Ján Paralič 1, Pavel Smrž.

Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.

ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.

Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.

Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK

Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.

1 Technologies for (semi-) automatic metadata creation Diana Maynard.

CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”

Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.

Extracting Metadata for Spatially- Aware Information Retrieval on the Internet Clough, Paul University of Sheffield, UK Presented By Mayank Singh.

NERC DataGrid NERC DataGrid Vocabulary Server Use Cases Vocabulary Workshop, RAL, February 25, 2009.

Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.

BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™

©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.

©2003 Paula Matuszek Taken primarily from a presentation by Lin Lin. CSC 9010: Text Mining Applications.

A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.

Christoph F. Eick University of Houston Organization 1. What are Ontologies? 2. What are they good for? 3. Ontologies and.

SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.

The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.

Tool for Ontology Paraphrasing, Querying and Visualization on the Semantic Web Project By Senthil Kumar K III MCA (SS)‏

OWL Representing Information Using the Web Ontology Language.

NLP pipeline for protein mutation knowledgebase construction Jonas B. Laurila, Nona Naderi, René Witte, Christopher J.O. Baker.

User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.

Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.

ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.

MedKAT Medical Knowledge Analysis Tool December 2009.

1 Class exercise II: Use Case Implementation Deborah McGuinness and Peter Fox CSCI Week 8, October 20, 2008.

KAnOE: Research Centre for Knowledge Analytics and Ontological Engineering Managing Semantic Data NACLIN-2014, 10 Dec 2014 Dr. Kavi Mahesh Dean of Research,

Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.

Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.

STAR, STELLAR and SKOS Ceri Binding, Phil Carlisle, Keith May, Doug Tudhope, Andreas Vlachidis University of Glamorgan and English Heritage.

TextCrowd – Collaborative semantic enrichment of text-based datasets

Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham

Text Analytics in ITS 2.0: Annotation of Named Entities

2. An overview of SDMX (What is SDMX? Part I)

C. Binding, K. May1, R. Souza, D. Tudhope, A. Vlachidis

Breaking Down Barriers to Interoperability

Presentation transcript:

Semantic Annotation of Grey Literature from an Archaeological Digital Library Andreas Vlachidis, Doug Tudhope Hypermedia Research Unit University of Glamorgan

Semantic Annotation of Grey Literature from an Archaeological Digital Library Semantic Technologies for Archaeology Resources AHRC funded 3 year project Investigate semantic technologies for integrating and cross searching datasets and associated grey literature Acknowledgements Ceri Binding (Glamorgan) Keith May (English Heritage)

Semantic Annotation of Grey Literature from an Archaeological Digital Library Is the Web machine-readable? Yes

Semantic Annotation of Grey Literature from an Archaeological Digital Library Is the web machine understandable? No Amsterdam Netherlands has_capital City type

Semantic Annotation of Grey Literature from an Archaeological Digital Library Machine readable vs. machine understandable What we say to the machine: The Cat in the Hat ISBN: Author: Dr. Seuss Publisher: Collins What the machine understands: bla bla bla bla

Semantic Annotation of Grey Literature from an Archaeological Digital Library STAR Current situation is one of fragmented datasets and applications, with different terminology systems Need for integrative conceptual framework English Heritage extended CIDOC CRM ontology for archaeology Need for terminology control English Heritage Thesauri Recording Manual glossaries augmented with dataset glossaries

Semantic Annotation of Grey Literature from an Archaeological Digital Library General Architecture RRAD RPRE RDF Based Common Ontology Data Layer (CRM / CRMEH / SKOS) Grey literature Grey literature EH thesauri, glossaries LEAP STAN IADB Data Mapping / Normalisation Conversion Indexing Web Services, SQL, SPARQL Applications – Server Side, Rich Client, Browser

Semantic Annotation of Grey Literature from an Archaeological Digital Library STAR outcomes Aim: “To investigate the potential of semantic terminology tools for widening access to digital archaeology resources, including disparate data sets and associated grey literature” Research Demonstrator Rich semantic indexing of OASIS grey literature reports More specific focus than complementary ADS ArcheoTools project

Semantic Annotation of Grey Literature from an Archaeological Digital Library Background and Definitions Information Extraction Information Extraction (IE) is a Natural Language Processing technique defined as a text analysis task aimed at extracting targeted information from context. It is a process where a textual input is analysed to form a textual output able for further manipulation. Semantic Annotation Specific metadata generation and usage schema (usually described by Ontology) aimed to automate identification of concepts and their relationships in documents

Semantic Annotation of Grey Literature from an Archaeological Digital Library Rule Based Information Extraction  Aims to Enable ‘rich’, semantic aware indexing of Archaeology fieldwork reports (Grey Literature) with respect to the CRM-EH Conceptual Reference Model (Ontology)  Grey Literature; source materials that can not be found through the conventional means of publication  OASIS Online AccesS to the Index of achaeological investigationS - Coordinated by ADS - Online index to Archaeological Grey Literature - Accessed via ADS ArchSearch online Service (

Semantic Annotation of Grey Literature from an Archaeological Digital Library General Architecture for Text Engineering XML structures to represent semantic properties EH Thesaurus Java Pattern Engine ADS – OASIS Grey Literature Ontology -CIDOC CRM-EH Gazetteer Lists Infrastructure for processing human language. Provides architecture, a framework and a development environment for developing and deploying natural language software components

Semantic Annotation of Grey Literature from an Archaeological Digital Library Rule Based Information Extraction Java Annotation Pattern Engine (JAPE): provides finite state transduction over annotations based on regular expressions (patterns - rules). IE Pipeline: consists of a set of phases, a cascading mechanism that runs sequentially a set of JAPE rules and text analysis processes (Tokenization, POS, etc) E49 E49 “Late Bronze Age or Early Iron Age” E49 E19 “prehistoric pottery” E53 ( / ) “Ditch containing prehistoric pottery”

Semantic Annotation of Grey Literature from an Archaeological Digital Library The KBIE Process  The Knowledge Based Information Extraction (KBIE) process is completed in three phases  Pre-processing: aimed to identify parts/sections that would assist in further processing of the document. Not related to ontology  Name Entity Recognition (NER) : aimed to exploit the KR and to provide the based Lookup mentions of selected concepts. Related to CIDOC-CRM ontology  Events Recognition (Relation Extraction): aimed to identify connections between previously identified CIDOC - CRM entities in text. Related to the CRM-EH ontology

Semantic Annotation of Grey Literature from an Archaeological Digital Library Pre-processing Phase  The Pre-processing phase is targeted to extract the following document sections:  Headings  Negation Phrases  Summary Sections  Noun Phrases  Verb Phrases  All the above types are used by the 2nd phase (NER) for validation of Lookup matches generated by the GATE gazetteers  The ANNIE application of GATE is used for producing the noun and the verb phrases

Semantic Annotation of Grey Literature from an Archaeological Digital Library Name Entity Recognition (NER)  The NER phase is targeted to extract the following annotation types with respect to the CIDOC-CRM.  E4.Period  E19.Physical Object  E53.Place  E57.Material  The phase supports disambiguation techniques between Material-Physical Object Lookup based on Word Pair disambiguation and use of Part of Speech (POS) : Determiners and Adjectives.  No Lookup generation for matches that belong to Heading, Tabular Data and Negation Sections  All Lookup matches must be parts of noun phrases

Semantic Annotation of Grey Literature from an Archaeological Digital Library NER - Semantic Expansion over Thesauri  How much to use from available thesauri structures?  IE pipeline configurable to run in 4 different modes  EH Glossary Terms used as entry points  Synonym: glossary terms plus their synonyms  Hyponym : Synonyms plus Narrow terms  Hypernym : Hyponym plus Broad terms  All available resources

Semantic Annotation of Grey Literature from an Archaeological Digital Library Targeting CRM-EH entities and Events  The third phase is targeted to extract a range of CRM-EH Entities and Events including  Context  Context Find  Context Event Time Appellation  Context Find Material  Context Find Time Appellation  Context Event  Context Find Production Event  Context Find Deposition Event

Semantic Annotation of Grey Literature from an Archaeological Digital Library Meaningful Connections between Entities  Context Event Time Appellation with Context i.e. “Roman deposits”  Context Find Production Event Time Appellation with Context Find i.e. “Mediaeval Pottery”  Context Find Deposition Event Context with Context Find i.e. “Ditch containing coins”  Consists of Material and Context Find i.e. “Copper alloy artefacts”

Semantic Annotation of Grey Literature from an Archaeological Digital Library CRM-EH Entities and Events (Example)

Semantic Annotation of Grey Literature from an Archaeological Digital Library CRM-EH Entities and Events (XML output)

Semantic Annotation of Grey Literature from an Archaeological Digital Library CRM-EH Entities and Events (RDF triples)

Semantic Annotation of Grey Literature from an Archaeological Digital Library Using the IE Output  The STAR demonstrator  Making use of the decoupled RDF files  Cross searching between grey literature and datasets  A SPARQL engine supports the semantic search  Semantic Search Examples  Context of type X containing Find of type Y “hearth” containing “coin”,  Context Find of type X within Context of type Y “Animal Remains” within “pit”. “the test pit produced a range of artefactual material which included animal bone”

Semantic Annotation of Grey Literature from an Archaeological Digital Library Example of Grey Literature Annotations

Semantic Annotation of Grey Literature from an Archaeological Digital Library Evaluation  Information Extraction Challenges  Domain Specific Issues  Language Ambiguity  False Positives  Coverage of Knowledge Base Resources  Evaluation of IE  Input from Experts is Critical  Basis for Assessing and Improving the IE system  Evaluation Method  The 'gold standard' describes a test set of human annotated documents,  It is used for comparison with system produced automatic annotations.

Semantic Annotation of Grey Literature from an Archaeological Digital Library SynonymHyponymHypernymAll Res. Precision-KM Precision-PC Precision-TB Recall-KM Recall-PC Recall-TB F-Measure-KM F-Measure-PC F-Measure-TB Precision Recall F-measure Evaluation Results

Semantic Annotation of Grey Literature from an Archaeological Digital Library PrecisionRecallF-Measure KM-PC-TB KM-PC KM-TB PC-TB Inter Annotator Agreement Score Evaluation Results

Semantic Annotation of Grey Literature from an Archaeological Digital Library Evaluation Results  Results validate the initial hypothesis about optimum semantic expansion.  Too little cause Recall to suffer  Too much cause Precision to suffer.  The best performance (2 out of 3 annotators) is on the Hyponym  Trade off between Hyponym and Hypernym in Precision and Recall.

Semantic Annotation of Grey Literature from an Archaeological Digital Library Evaluation Results  Annotators agree 65%  It is a good percentage  Archaeotools reported IAA around 60%;  IAA in Cultural Heritage is usually low  KM and PC agree 69%  KM and TB agree 67%  TB and PC 61%  Not all annotators distinguished the Material from the Physical object  Time appellations such as Phase or 'episode of flooding' cause disagreement

Semantic Annotation of Grey Literature from an Archaeological Digital Library Questions and References  Questions ?  URLs     

Semantic Annotation of Grey Literature from an Archaeological Digital Library