Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.

Slides:



Advertisements
Similar presentations
1 OOA-HR Workshop, 11 October 2006 Semantic Metadata Extraction using GATE Diana Maynard Natural Language Processing Group University of Sheffield, UK.
Advertisements

Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
University of Sheffield, NLP Case study: GATE in the NeOn project Diana Maynard University of Sheffield.
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
So What Does it All Mean? Geospatial Semantics and Ontologies Dr Kristin Stock.
A Linguistic Approach for Semantic Web Service Discovery International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) July 13, 2012 Jordy.
University of Sheffield, NLP Annotation and Evaluation Diana Maynard, Niraj Aswani University of Sheffield.
Computer Integrated Manufacturing CIM
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Automatic Evaluation of Migration Quality in Distributed Networks of Converters Miguel Ferreira Supervisors Ana Alice Baptista.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Two-Level Semantic Annotation Model BYU Spring Conference 2007 Yihong Ding Sponsored by NSF.
DI FC UL1 Gene Function Prediction by Mining Biomedical Literature Pooja Jain Master in Bioinformatics Supervisor - Mário Jorge Costa Gaspar.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
A New Web Semantic Annotator Enabling A Machine Understandable Web BYU Spring Research Conference 2005 Yihong Ding Sponsored by NSF.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Supplement 02CASE Tools1 Supplement 02 - Case Tools And Franchise Colleges By MANSHA NAWAZ.
Automatic Data Ramon Lawrence University of Manitoba
Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
BYU Data Extraction Group Funded by NSF1 Brigham Young University Li Xu Source Discovery and Schema Mapping for Data Integration.
University of Jyväskylä – Department of Mathematical Information Technology Computer Science Teacher Education ICNEE 2004 Topic Case Driven Approach for.
Text mining and the Semantic Web Dr Diana Maynard NLP Group Department of Computer Science University of Sheffield.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
COMPARISON STUDY BETWEEN AGILEFANT AND XPLANNER PLUS Professor Daniel Amyot Ruijun Fan Badr Alsubaihi Submitted to Professor Daniel Amyot.
Erasmus University Rotterdam Introduction Nowadays, emerging news on economic events such as acquisitions has a substantial impact on the financial markets.
Claudia Marzi Institute for Computational Linguistics (ILC) National Research Council (CNR) - Italy.
Retrieval Effectiveness of an Ontology-based Model for Information Selection Khan, L., McLeod, D. & Hovy, E. Presented by Danielle Lee.
ELearning Tools for PLAR Uri Shafrir Ontario Institute for Studies in Education University of Toronto Masha Etkind Ryerson University.
Learning Information Extraction Patterns Using WordNet Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield,
Chapter 8 Architecture Analysis. 8 – Architecture Analysis 8.1 Analysis Techniques 8.2 Quantitative Analysis  Performance Views  Performance.
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
A hybrid method for Mining Concepts from text CSCE 566 semester project.
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
Funded by: European Commission – 6th Framework Project Reference: IST WP 2: Learning Web-service Domain Ontologies Miha Grčar Jožef Stefan.
Usability Issues Documentation J. Apostolakis for Geant4 16 January 2009.
Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors : JEROEN DE KNIJFF, FLAVIUS FRASINCAR, FREDERIK HOGENBOOM DKE Data & Knowledge.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
1 Technologies for (semi-) automatic metadata creation Diana Maynard.
Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.
GEORGIOS FAKAS Department of Computing and Mathematics, Manchester Metropolitan University Manchester, UK. Automated Generation of Object.
Semantic Technologies & GATE NSWI Jan Dědek.
University of Sheffield, NLP Annotation and Evaluation Diana Maynard, Niraj Aswani University of Sheffield.
Semantic Network as Continuous System Technical University of Košice doc. Ing. Kristína Machová, PhD. Ing. Stanislav Dvorščák WIKT 2010.
1 Language Technologies (1) Diana Maynard University of Sheffield, UK ACAI 05 ADVANCED COURSE ON KNOWLEDGE DISCOVERY.
Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Department of Information Science and Applications Hsien-Jung Wu 、 Shih-Chieh Huang Asia University, Taiwan An Intelligent E-learning system for Improving.
Theme 2: Data & Models One of the central processes of science is the interplay between models and data Data informs model generation and selection Models.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Natural Language Interfaces to Ontologies Danica Damljanović
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
1 The Software Development Process ► Systems analysis ► Systems design ► Implementation ► Testing ► Documentation ► Evaluation ► Maintenance.
Technology-enhanced Learning: EU research and its role in current and future ICT based learning environments Pat Manson Head of Unit Technology Enhanced.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
UNEP Live. What is UNEP Live? - An on-line knowledge management platform - Focuses on open access to global, regional and national data and knowledge.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Improving Data Discovery Through Semantic Search
Development of the Amphibian Anatomical Ontology
Balanced Scorecard Designer
Focussed on whole of population data set
Benchmarking Textual Annotation Tools for the Semantic Web
Combining Keyword and Semantic Search for Best Effort Information Retrieval  Andrew Zitzelberger 1.
Presentation transcript:

Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK

What? Work in the context of the EU Network of Excellence KnowledgeWeb Case studies in field of bioinformatics Developing benchmarking tools and test suites for ontology generation and evaluation –New metrics for evaluation –New visualisation tools –Development of usability criteria

Why? Increasing interest in the use of ontologies in bioinformatics, as a means of accessing information automatically from large databases Ontologies such as GO enable annotation and querying of large databases such as SWISS- PROT. Methods for IE have become extremely important in these fields. Development of OBIE applications is hampered by lack of standardisation and suitable metrics for testing and evaluation Main focus till now on performance over practical aspects such as usability and accessibility.

Gene Ontology Collaborative ontology construction has been practiced in the gene ontology community for a long time compared with other communities. This makes it a good case study for testing applications and metrics. Used in KnowledgeWeb to show that the SOA tools supporting communities creating their own ontologies can be further advanced by suitable evaluation techniques, amongst other things.

Automatic Annotation Tools Semantic annotation is used to create metadata linking the text to one or more ontologies Enables us to combine and associate existing ontologies, to perform more detailed analysis of the text, and to extract deeper and more accurate knowledge Semantic annotation generally relies on ontology-based IE techniques Suitable evaluation metrics and tools for these new techniques are currently lacking

Requirements for Semantic Annotation Tools Expected functionality: level of automation, target domain, text size, speed Interoperability: ontology format, annotation format, platform, browser Usability: installation, documentation, ease of use, aesthetics Accessibility: flexibility of design, input and display alternatives Scalability: text and ontology size Reusability: range of applications

Performance Evaluation Metrics Evaluation metrics mathematically define how to measure the system’s performance against human-annotated gold standard Scoring program implements the metric and provides performance measures –for each document and over the entire corpus –for each type of annotation –may also evaluate changes over time A gold standard reference set also needs to be provided – this may be time-consuming to produce Visualisation tools show the results graphically and enable easy comparison

GATE AnnotationDiff Tool

Correct and incorrect instances attached to concepts

Evaluation of instances by source

Methods of evaluation Traditional IE is evaluated in terms of Precision, Recall and F-measure. But these are not sufficient for ontology-based IE, because the distinction between right and wrong is less obvious Recognising a Person as a Location is clearly wrong, but recognising a Research Assistant as a Lecturer is not so wrong Similarity metrics need to be integrated so that items closer together in the hierarchy are given a higher score, if wrong

Learning Accuracy LA [Hahn98] originally defined to measure how well a concept had been added in the right level of the ontology, i.e. ontology generation Later used to measure how well the instance has been added in the right place in the ontology, i.e. ontology population. Main snag is that it doesn’t consider the height of the Key concept, only the height of the Response concept. Also means that similarity is not bidirectional, which is intuitively wrong.

Balanced Distance Metric We propose BDM as an improvement over LA Considers the relative specificity of the taxonomic positions of the key and response Does not distinguish between the directionality of this relative specificity, e.g. Key can be a specific concept (e.g. 'car') and the response a general concept (e.g. 'relation'), or vice versa. Distances are normalised wrt average length of chain Makes the penalty in terms of node traversal relative to the semantic density of the concepts in question

BDM – the metric BDM is calculated for all correct and partially correct responses CP = distance from root to MSCA DPK = distance from MSCA to Key DPR = distance from MSCA to Response n1: average length of the set of chains containing the key or the response concept, computed from the root concept.

Augmented Precision and Recall BDM is integrated with traditional Precision and Recall in the following way:

Conclusions Semantic annotation evaluation requires: –New metrics –Usability evaluation –Visualisation software Bioinformatics field is a good testbench, e.g. evaluation of protein name taggers Implementation in GATE Knowledge Web benchmarking suite for evaluating ontologies and ontology-based tools

“ We didn’t underperform. You overexpected.” A final thought on evaluation