Presentation is loading. Please wait.

Presentation is loading. Please wait.

A general-purpose text annotation tool called Knowtator is presented. Knowtator facilitates the manual creation of annotated corpora that can be used for.

Similar presentations


Presentation on theme: "A general-purpose text annotation tool called Knowtator is presented. Knowtator facilitates the manual creation of annotated corpora that can be used for."— Presentation transcript:

1 A general-purpose text annotation tool called Knowtator is presented. Knowtator facilitates the manual creation of annotated corpora that can be used for evaluating or training a variety of natural language processing systems. Building on the strengths of the widely used Protégé knowledge representation system, Knowtator has been developed as a Protégé plug-in that leverages Protégé’s knowledge representation capabilities to specify annotation schemas. Knowtator’s unique advantage over other annotation tools is the ease with which complex annotation schemas (e.g. schemas which have constrained relationships between annotation types) can be defined and incorporated into use. Introduction Larry Hunter, PhD 1 Zhiyong Lu 1 Kevin Cohen 1 Mike Bada 1 Andrew Dolbey 1 Christopher G. Chute, MD DrPH 2 Guergana Savova, PhD 2 Serguei Pakhomov, PhD 2 Marcelline R. Harris, PhD 2 1.University of Colorado Health Sciences Center, Aurora, CO. 2.Mayo Clinic College of Medicine, Rochester, MN. Knowtator is a general-purpose text annotation tool. Synopsis Knowtator: A Protégé plug-in for annotated corpus construction Philip V. Ogren Division of Biomedical Informatics, Mayo Clinic College of Medicine, Rochester, Minnesota, USA Knowtator is a Protégé plug-in. Knowtator is open source and available at: bionlp.sourceforge.net/Knowtator Acknowledgements Example The following outlines an example of how Knowtator can be used to annotate problem statements, outcomes, and interventions that are found in clinical notes. The annotation schema shown in Knowtator is based on the International Classification for Nursing Practice (ICNP), a controlled vocabulary and data model created specifically for coding in this domain. Annotation Schema Creation: The Protégé knowledge-base editor can be used to create new class (Figure 1), instance, slot (Figures 2 and 3), and facet frames for defining the annotation schema. Figure 1 The creation of a subclass of Statement in progress using the Protégé class editor is shown. Figure 2 The class definition for Problem Statement is shown with its slots and the constraints on those slots (e.g. an action of a Problem Statement must be of type Action). Figure 3 The only slot of the class Artifact is a simple attribute that accepts a string value corresponding to an identifier for a term in the ICNP controlled vocabulary. Annotation of Text: Once an annotation schema has been created, then it can be immediately used for text annotation. Figure 4 shows some text that is going to be annotated. On the left is the subsumption hierarchy of the available annotation types. A single annotation has been created for the span of text ‘pain’ and is annotated to the class Process. Figure 4 The text ‘pain’ was highlighted with a mouse, the class Process was selected and an annotation was created. (Continued from previous column) Figure 5 The annotation corresponding to the text ‘pain’ has slot that relates this annotation to a specific identifier in the ICNP terminology. A dialog that allows the entry of a string value for the identifier is shown. Figure 6 An annotation corresponding to the class Problem Statement has been created. There is no span associated with the annotation. However, Problem Statement has several slots (shown in Figure 2) that correspond to other annotations in the text. The annotation for the span of text ‘parascapular thoracic’ with the class Body Structure becomes the value of the location slot of the Problem Statement annotation. The slots of the class definitions in the annotation schema define what properties an annotation can have. Figure 5 shows an example of a simple slot that holds the value of an identifier from a controlled vocabulary for an annotation of the class Process. Figure 6 shows an example of a complex slot that relates an annotation of type Problem Statement to an annotation of type Body Structure via the location slot. A key strength of Knowtator is its ability to relate annotations to each other via the slot definitions of the corresponding annotated classes. In the ICNP example above, the slot location of the class Problem Statement relates to the Body Structure annotation for the text extent ‘parascapular thoracic’. The constraints on the slot ensure that the relationships between annotations are consistent. Protégé is capable of representing much more sophisticated and complex conceptual models which can be used, in turn, by Knowtator for text annotation. Also, because Protégé is often used to create conceptual models of domains relating to biomedical disciplines, Knowtator is especially well suited for capturing named entities and their relations for those domains. Features Merges annotations from multiple annotators Performs a variety of inter-annotator agreement metrics along with detailed error analysis data. Consensus set creation mode for consolidating differences between two or more annotators Pluggable architecture for handling different text sources Stand-off annotation (i.e. the annotated text is not modified) XML import/export Scalable – can run on a standalone laptop or with a database backend (or both) Mozilla Public License version 1.1 Filters provide fine-grained control over display, annotation export, consensus set creation, and inter- annotator agreement. Display of annotations is highly configurable with respect to the text shown and highlight color.


Download ppt "A general-purpose text annotation tool called Knowtator is presented. Knowtator facilitates the manual creation of annotated corpora that can be used for."

Similar presentations


Ads by Google