Presentation is loading. Please wait.

Presentation is loading. Please wait.

Integration of Information Extraction with an Ontology M. Vargas-Vera, J.Domingue, Y.Kalfoglou, E. Motta and S. Buckingham Sum.

Similar presentations


Presentation on theme: "Integration of Information Extraction with an Ontology M. Vargas-Vera, J.Domingue, Y.Kalfoglou, E. Motta and S. Buckingham Sum."— Presentation transcript:

1 Integration of Information Extraction with an Ontology M. Vargas-Vera, J.Domingue, Y.Kalfoglou, E. Motta and S. Buckingham Sum

2 Introduction Ontology -> Information Extractor English text (NLP) Group of tools their IE system: KMi Ontology From UMass: Marmot Crystal Badger OCML preprocessor

3 Presentation Layout Background on tool origins and area of work Description of tool integration Coping with ambiguity Description of output Population of Ontology Future Work

4 UMass University of Massachutes Amherst Marmot, Crystal, Badger –Classifies text by recognizing extraction patterns and semantic features associated to slots in predefined frames.

5 Testing Area: KMi Planet Web-based new server –Story Library Collections of news stories and postings –Ontology Library Ontologies stored for use in extracting information from the story library. Uses OCML myPlanet uses cue-phrases defined as “research areas” to query KMi planet through the ontology library and the information extraction tools we’re about to talk about

6 The Ontology Library 40 different types of events or activities that can be described by the ontology library. Event type 3: demonstration-of-technology technology-being-demostrated (technology) (Info Extraction) has-duration (duration) (30 min) start-time (time-point) (3:30pm) end-time (time-point) (4pm) has-location (a place) (room 120 TMCB BYU campus) other agents-involved (list of person(s)) (Dr. Embley) main-agent (list of person(s)) (Brian Goodrich) location-at-start (a place) (room 120 TMCB BYU campus) location-at-end (a place) (room 120 TMCB BYU campus) medium-used (equipment) (mutli-media projector, ppt) subject-of-the-demo (title) (Integration of Information Extraction with an Ontology)

7 Marmot Natural Language Processor Noun, Verb, and Prepositional Phrases “John Domingue Wed, 15 Oct 1997. David Brown, University for Industry visits the OU.” 2 1 SUBJ(1): DAVID BROWN %COMMA% UNIVERSITY PP (2): FOR INDUSTRY VB (3): VISITS OBJ1(4): THE OU PUNC(5): %PERIOD% 1 1 SUBJ(1): JOHN DOMINGUE ADVP(2): @WED_%COMMA%_15_OCT_1997@ PUNC(3): %PERIOD%

8 Crystal Dictionary Induction Tool Using keyword to annotate text with semantic tags. Visitor ( David Brown ) Place ( the OU ) Specific-to-general driven data search Relaxes constraints on initial definitions until it finds the most specific definition that covers all instances of the word in the text. Retains results for future use Tested on over 300 stories, 100% precision and recall

9 Badger http://rockape.qgl.org/crap/badger.swf Matches sentences from text against concept nodes passed from Crystal. Select the best match by max number of features matching the concept node. Can remove irrelevant sentences from problem set. (fairly certain whoever wrote this section did not speak English as first language)

10 Coping with Ambiguity Query list of institutions Query list of projects Return list of institutions – no match Return list of project - match No discussion of whether this was automatically done by the extractor or manually by the users.

11 OCML Code Translator (Operational Conceptual Modeling Language) Tokenise Badger output, find corresponding CN definitions and extract all the objects found in the story

12 Ontology Maintenance Use Badger (lexicon) and Crystal (concept) output to automatically update Ontology library whenever a new story is added to the Story library Some cannot be automatically updated: –There is not enough information in the story –No current template to match with the sentence concepts.

13 Conclusion IE system created using Marmot, Crystal, Badger and the OCML translator. Obtained good results in KMi stories. Assessment Sporadic periods of quality technical writing, interspersed with nearly impenetrable English A borrowing of tools, translated to OCML and ported for KMi

14 Future Work Deriving the type of an object when it does not match a predefined template. Automatic creation of new classes and subclasses. Using this IE tool in other domains (need new training data?) Trying out a new Machine Learning algorithm in Crystal and comparing performance. Using the IE tool hypertext. Saving Badger’s output in XML Creating a more visual gui for the ontologies.


Download ppt "Integration of Information Extraction with an Ontology M. Vargas-Vera, J.Domingue, Y.Kalfoglou, E. Motta and S. Buckingham Sum."

Similar presentations


Ads by Google