AeroDAML Applying Information Extraction to Generate DAML Annotations Dr. Paul Kogut Lockheed Martin Management & Data Systems
Page 2 What is Information Extraction? Events Text or web pages Information Extraction Linguistic Knowledge Co-references Entities Relationships
Page 3 l Consumer-side extraction - 3rd party text -> database u Advantages: è Applicable to raw documents (most of the web) u Disadvantages: è Must deal with full complexity of natural language l Semantic annotation proposed to overcome difficulty of consumer-side extraction - but annotation is labor intensive l Producer-side extraction - authored text -> annotation u Advantages: è Partial-automation - reduces manual effort è Human assisted disambiguation è Domain customization for intranets and B2B e-commerce u Disadvantages: è Requires manual effort to correct and add rich set of relationships è Domain customization requires up-front effort from the author/webmaster l Both types of extraction will coexist. Extraction and Semantic Annotation
Page 4 AeroDAML Architecture DAML Annotator DAML annotated text or web pages Annotation Editor Text Extraction Text or web pages Extraction to DAML Translation DAML Ontologies basic annotation basic annotation refined annotation UBOT
Page 5 Client-Server AeroDAML l Users: u personnel who routinely produce documents (e.g., intelligence analysts) u personnel who have a large collection of legacy documents
Page 6 Web-based AeroDAML l Users: u novice/infrequent DAML annotators u people who want to do quick/simple annotation of a web page
Page 7 AeroDAML Output: Entities
Page 8 AeroDAML Output: Relationships
Page 9 AeroDAML Output: Co-reference
Page 10 AeroDAML Plans l Integrate with annotation editor l Improve Web-based AeroDAML u Allow user to select other ontologies besides the current AeroDAML default ontology for annotation generation: è OpenCyc or Cyc Upper Ontology è CIA World Fact Book è IEEE Standard Upper Ontology è Dublin Core è UNSPSC... l Try AeroDAML! u