Presentation is loading. Please wait.

Presentation is loading. Please wait.

Type Systems, Interoperability and Database Population Eric Nyberg, CMU Shilpa Arora, CMU Lance Ramshaw, BBN.

Similar presentations


Presentation on theme: "Type Systems, Interoperability and Database Population Eric Nyberg, CMU Shilpa Arora, CMU Lance Ramshaw, BBN."— Presentation transcript:

1 Type Systems, Interoperability and Database Population Eric Nyberg, CMU Shilpa Arora, CMU Lance Ramshaw, BBN

2 Outline Annotation sample analysis –emergent type systems –ongoing issues / clarification questions Data interoperability Database population –CMU’s Annotations DB –OntoNotes –Possible architecture for interoperability with UIMA annotators –Issues for Discussion

3 Task Analyze sample outputs from different annotation groups Formalize annotation type system (UML object model) for each sample Generate clarification questions Work toward a unified type system Work toward interoperability architecture In progress, not finished Not started

4 For each annotation sample: Overview of what we received Brief example annotation Type system analysis Issues / Questions

5 Whats in the bin ? 5 #Annotation ManualSamplesAnalysisType System 1.1CMU Belief Annotationsxxxx 1.2CMU Event Coreference Annotationsx 2.1 Ed Hovy's Group - Noun Sense Annotationxxx 3.1BBN Temporal Ordering Annotationxxxx 3.2BBN Name Annotationsxxx 3.3BBN Coreference Annotationxxx 3.4BBN (Complex) Coreference Annotationxxxx 4.1UMBC Modality Annotationxxxx 5.1Columbia Dialog Annotationx

6 CMU/Columbia Belief Annotation Annotation Manual: –Davis et. al., “Annotating belief in Communication: Manual” Annotation Units: Propositions identified by PropBank and NomBank 6

7 CMU/Columbia CMU Belief Annotation Three categories: –Committed belief: Belief expressed in utterance Can be a proposition about present or future E.g. (1) I know Mark and Sandra have eloped. (2) The sun will rise again. (Future) –Non-committed belief: Not a strong belief Can be a proposition about present or future E.g. (1) Mark and Sandra may have eloped. (2) John may return tomorrow. –Not application: Not a belief E.g. (1) I wish Mark and Sandra would finally elope. 7

8 CMU/Columbia Belief Annotation Five Classes: –Committed Belief –Committed Belief Future –Non-Committed Belief –Non-Committed Belief Future –Not Applicable 8

9 CMU/Columbia Belief Annotation: Type System (1) 9

10 CMU/Columbia Belief Annotation: Type System (2) 10

11 CMU/Columbia Belief Annotation: Type System (3) 11

12 Follow up questions Extensions: –What extensions do we expect to the annotation scheme? –How best we can tailor the type system towards expected future changes Requirements from application domain? –Do we have a set of requirements from the application side? 12

13 Ed Hovy’s group Annotations: –Annotated with OntoNotes for Noun senses –205 nouns, one file for each noun, sense + location in files for each noun is stored Sample annotations: –eng/AFGP-2002-600175-Trans.txt 427 4 position-n@0.0 3 Mon Dec 3 02:31:27 2007 –eng/AFGP-2002-602187-Trans.txt 25 6 position-n@0.0 2 Mon Dec 3 02:31:27 2007 –Noun="position", sense=3; file= AFGP-2002-600175-Trans.txt, position = “427 4” –Noun="position", sense=3; file=AFGP-2002-602187-Trans.txt, position=“25 6” 13

14 TypeSystem (Ed Hovy et. al. Annotation) 14

15 BBN 1.BBN TTO-3 Temporal Ordering Annotation 2.BBN Name Annotations: named entities – org, date, per etc 3.BBN-Coref-Annotation: entity (with type) and entity mentions etc 4.BBN-complex-coref-annotation 15

16 Temporal Relationship Assignment IDTTTPTR 11/281DS2A Arrived2EP0B yesterday3DS2C told4SP2B Visiting5EUN4A left6EP4A Return7EF2A Monday8DS7C is 9BC0C Return10EF9A day 11DU10C 16

17 Type System (BBN Temporal Ordering Annotation) 17

18 BBN Name Annotations (Type system) 18

19 BBN-complex-coref-annotation Annotations: Relations between entities –Member –Member Base –Subset –Subset Size (future type system) Other annotations - Attributes of a mention –Reference type –Syntactic Context 19

20 20 Type System for BBN (Complex) coreference annotation

21 21 Type System for BBN (Complex) coreference annotation (contd…)

22 UMBC Modality Annotations TMR – Text Meaning Representation or Concepts annotated Main Annotation – Modality. It has three main attributes: TYPE, VALUE, SCOPE & ATTRIBUTED-TO TMRs can be nested i.e. attributes or relation can refer to other TMRs 22

23 23 UMBC Modality Annotations

24 Interoperability: Data Common data model Multiple implementations –based on the same underlying schema (formal object model) –meet different goals / requirements Implementation Criteria: –support effective run-time annotation (e.g. UIMA type system) –Support effective user interface, query/update (e.g. OntoNotes) –Support on-the-fly schema extension (e.g. CMU’s AnnotationsDB)

25 Interoperability: Data [2] Formal object model is mapped to: –UIMA type system definition (create) –OntoNotes RDBMS schema (extend) –CMU’s Annotations DB (extend) Annotated data can be represented in any format that implements the formal model “Have your cake and eat it too”

26 CMU’s Annotations Database MySQL implementation Java APIs (SQL connection API and simple object access API) Fully integrated with UIMA Used on DTO and DARPA projects PRO: tag types can be extended at run time by the application (schema supports open-ended type definition) CON: interactive tools are currently limited

27 JAVELIN Project Briefing AQUAINT Program Annotations Database In an interview with Defense News, Indian Defence Research and Development Organization (DRDO) scientists said India was launching a comprehensive plan to develop a wide range of modern nuclear missiles. Within two years, India would develop an intercontinental ballistic missile (ICBM),... document datetime docno doctype passage text tag type value parent span offset length * * * *

28 28 An Integrated Annotation DB in OntoNotes Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel http://www.bbn.com/NLP/OntoNotes

29 29 Goals  Capture multiple layers of annotation and modeling –Syntax –Propositions –Word sense –Ontology –Coreference –Names  Using an integrated relational database representation –Enforces consistency across the different annotations –Supports integrated models that can combine evidence from different layers

30 30 Unified Representation  Provide a bare-bones representation independent of the individual semantics that can –Efficiently capture intra- and inter- layer semantics –Maintain component independence –Provide mechanism for flexible integration –Integrate information at the lowest level of granularity  A Relational Database

31 31 Unified Relational Representation Corpus Trees Coreference Names Propositions Senses

32 32 Example: DB Representation of Syntax Treebank tokens (stored in the Token table) provide the common base The Tree table stores the recursive tree nodes, each with its span Subsidiary tables define the sets of function tags, phase types, etc.

33 33 Advantages of an Integrated Representation  Each layer translates into a common representation  Clean, consistent layers –Resolve the inconsistencies and problems that this reveals  Well defined relationships –Database schema defines the merged structure efficiently  Original representations available as predefined views –Treebank, PropBank, etc.  SQL queries can extract examples based on multiple layers or define new views  Python Object-oriented API allows for programmatic access to tables and queries

34 34 Syntax Layer  Identifies meaningful phrases in the text  Lays out the structure of how they are related Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons, as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon. S major reductions and realignments of troops in central Europe... major reductions and realignments of troops in central Europe –... NP JJNNSCCNNSINNP NNS PP INNP JJNNP PP SYNTAX

35 35 ARG2 ARG1 ARGM-LOC Propositional Structure  Tells who did what to whom  For both verbs and nouns Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons, as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon.... major reductions and realignments of troops in central Europe –... NP JJNNSCCNNSINNP NNS PP INNP JJNNP PP S Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons, as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon.

36 36 reduce.01 – Make less aim.02 – Directed motion Predicate Frames Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons, as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon. Predicate Frames aim aim.01 – Plan aim.02 – Directed motion ARG0 – Aimer ARG1 – Action ARG0 – Aimer ARG1 – Thing in motion ARG2 – Target Predicate Frames reduction reduce.01 – Make less ARG0 – Agent ARG1 – Thing falling ARG2 – Amount fallen ARG3 – Starting point ARG4 – Ending point  Predicate frames define the meanings of the numbered arguments

37 37 Word Sense and Ontology  Meaning of nouns and verbs are specified  All the senses are annotatable at 90% inter-annotator agreement  Catalog of possible meanings supplied in the sense inventory files  Ontology links (currently being added) will capture similarities between related senses of different words Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons, as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon. Word Sense aim 1.Point or direct object, weapon, at something... 2.Wish, purpose or intend to achieve something Word Sense register 1.Enter into an official record 2.Be aware of, enter into someone’s conciousness 3.Indicate a measurement 4.Show in one’s face 2.Wish, purpose or intend to achieve something 1.Enter into an official record Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons, as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon.

38 38 Coreference  Identifies different mentions of the same entity in text – especially links definite, referring noun phrases, and pronouns in text  Two types – Identity as well as Attributive coreference tagged. Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons, as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon. President Bush conventional arms talk the Pentagon Vienna talks – which are aimed at the destruction of some 100,000 weapons, as well as major reductions and realignments of troops in central Europe the Pentagon Pentagon He e0

39 39 Example of DB Query Function for a_proposition in a_proposition_bank: if(a_proposition.lemma != "say"): arg_in_p_q = "select * from argument where proposition_id = '%s';" % (a_proposition.id) a_cursor.execute(arg_in_p_query) argument_rows = a_cursor.fetchall() for a_argument_row in argument_rows: a_argument_id = a_argument_row["id"] a_argument_type = a_argument_row["type"] if(a_argument_type != "ARG0"): n_in_arg_q = "select * from argument_node where argument_id = '%s';" % (a_argument_id) a_cursor.execute(n_in_arg_q) argument_node_rows = a_cursor.fetchall() for a_argument_node_row in argument_node_rows: a_node_id = a_argument_node_row["node_id"] a_ne_node_query = "select * from name_entity where subtree_id = '%s';" % (a_node_id) a_cursor.execute(a_ne_node_query) ne_rows = a_cursor.fetchall() for a_ne_row in ne_rows: a_ne_type = a_ne_row["type"] ne_hash[a_ne_type] = ne_hash[a_ne_type] + 1 a_tree = a_tree_document.get_tree(a_tree_id) a_node = a_tree.get_subtree(a_node_id) for a_child in a_node.subtrees(): a_ne_subtree_query = "select * from name_entity where subtree_id = '%s';" % (a_child.id) subtree_ne_rows = a_cursor.execute(a_ne_subtree_query) ne_subtree_rows = a_cursor.fetchall() for a_ne_subtree_row in ne_subtree_rows: a_subtree_ne_type = a_ne_subtree_row["type"] ne_hash[a_subtree_ne_type] = ne_hash[a_subtree_ne_type] + 1 if (proposition.lemma == “say”): query = “select * from argument where proposition_id = '%s';”.. What is the distribution of named entities that are ARG0s of the predicate “say”? if (argument_type == "ARG0"): for child in node.subtrees(): Name EntityFrequency Person84 GPE34 Organization29 NORP15...

40 40 Conclusion  Integrating the annotation layers using a relational schema –Improves consistency –Allows predictive features that combine evidence from multiple layers  Easily Accessible –Through Python API –SQL queries

41 Interoperability: Components OntoNotes Collection Reader OntoNotes CAS Consumer OntoNotes UIMA Analysis Engine Annotations DB ADB Collection Reader ADB CAS Consumer File System Collection Reader XCAS Collection Reader XCAS CAS Consumer XML TXT Existing UIMA wrapper New UIMA wrapper RDBMS storage File storage key A shared, formal type system allows multiple data formats to be combined effectively Customer’s annotators

42 Issues for Discussion Persistence formats optimize for different concerns –RDBMS – relational querying, update –XCAS – fast deserialization of run-time objects Consider extending schema to hold XML serialization of document annotations


Download ppt "Type Systems, Interoperability and Database Population Eric Nyberg, CMU Shilpa Arora, CMU Lance Ramshaw, BBN."

Similar presentations


Ads by Google