Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Human Language Technology in Ontology Engineering Ontology Learning from Text Paul Buitelaar.

Similar presentations


Presentation on theme: "© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Human Language Technology in Ontology Engineering Ontology Learning from Text Paul Buitelaar."— Presentation transcript:

1 © Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Human Language Technology in Ontology Engineering Ontology Learning from Text Paul Buitelaar DFKI GmbH Language Techology Lab DFKI Competence Center Semantic Web Saarbrücken, Germany

2 © Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Overview  HLT and Ontology Engineering  Automated Linguistic Analysis  Ontology Learning from Text  Further Issues: Evaluation  Conclusions

3 © Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Ontology Lifecycle Creating Populating Validating Evolving Maintaining Deploying

4 © Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 HLT in the Ontology Lifecycle Ontology (Knowledge) Ontology Learning Development & Evolution Linguistic Analysis to Extract Classes / Relations Ontology Population Knowledge Base Generation Linguistic Analysis to Extract Instances Instances Documents (Text) HLT for Ontology Learning and Population from Text Human Language Technology = Automated Linguistic Analysis Classes, Relations/Properties

5 © Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Automated Linguistic Analysis

6 © Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Linguistic Analysis: Example The Dell computer with a flat screen had to be rejected because of a failure in the motherboard. Dell computer flat screen motherboard has-a reject failure location-of animate-entity

7 © Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Levels of Linguistic Analysis Lexical Analysis  Word Class: Part-of-Speech (also Semantic Class)  Word Structure: Morphology Phrase Analysis  Sentence Structure: Phrases (if ‘shallow’: Chunks )  Semantic Units Dependency Structure Analysis  Sentence Meaning: Predicate Argument Structure (Clause)  Semantic Structure

8 © Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Part-of-Speech, Morphology Part-of-Speech  e.g.: noun, verb, adjective, preposition, …  PoS tag sets may have between 10 and 50 (or more) tags Morphology  Most languages have inflection and declination, e.g.: Singular/Plural computer, computers Present/Past reject, rejected  Many languages have also complex (de)composition, e.g.: Flachbildschirm (flat screen)> flach + Bildschirm > flach + Bild + Schirm

9 © Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Phrases, Terms, Named Entities Semantic Units  Phrases (e.g. nominal - NP, prepositional - PP) NP a flat screen PP with a flat screen NP (recursive) the Dell computer with a flat screen a failure in the motherboard  Terms (domain-specific phrases) Dell computer Dell computer with a flat screen  Named Entities (phrases corresponding to dates, names, …) COMPANY Dell COMPANY Dell Computer Corporation PERSON Michael Dell

10 © Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Dependency Structure (I) Semantic Structure  Dependencies between Predicates and Arguments the Dell computer with a flat screen had to be rejected PRED: reject ARG1: ENTITY ARG2: ‘the Dell computer with a flat screen’ ‘Logical Form’ : reject(x,y) & animate-entity(x) & computer(y) & …  Dependency Structure Analysis is based on: Sub-categorization Frames reject :: Subj:NP, Obj:NP Selection Restrictions reject :: Subj:NP:ANIMATE-ENTITY, Obj:NP:ENTITY

11 © Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Dependency Structure (II) The Dell computer that has been rejected was claimed to have suffered from handling. reject(e 1,x 1,y 1 ) & animate-entity(x 1 ) & Dell_computer(y 1 ) & claim(e 2,x 2,e 3 ) & animate-entity(x 2 ) & suffer_from(e 3,y 1,y 2 ) & handling (y 2 ) PRED claim SUBJ y 1 XCOMP PRED computer MOD Dell ADJUNCT PRED reject PRED suffer SUBJ y 1 OBL-from handling claim y1y1 Dell reject suffer y1y1 y1y1 handling SUBJ XCOMP MOD ADJUNCTOBL-from SUBJ y 1 : computer Lexical Functional Grammar (LFG)

12 © Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Ontology Learning from Text

13 © Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Some History Lexical Knowledge Extraction  Extraction of lexical semantic representations (word meaning) from Machine Readable Dictionaries – 70‘s/80‘s  Extraction of semantic lexicons from corpora for Information Extraction systems - 80‘s/90‘s, e.g. CRYSTAL (Soderland)  Answer extraction in Question Answering, e.g. Webclopedia (Hovy) Thesaurus Extraction  Similar work, (complex, multilingual) term extraction  e.g. Sextant (Grefenstette); DR-Link (Liddy) Ontology Learning from Text  Similar work, (domain-specific) term / relation extraction  e.g. TextToOnto (Maedche & Staab), OntoLearn (Velardi et al.)  Discussed here: OntoLT (Buitelaar, Olejnik & Sintek)

14 © Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 TextToOnto Association Rules

15 © Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 OntoLearn Domain-Specific WordNet Tuning and Extension

16 © Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 OntoLT: Some Background Ontology Learning from Text  Taxonomy Extraction, Document Clustering String-based, Document Level  “Unnamed” Relation Extraction, Word Clustering Stemming & Part-of-Speech, Token Level  Extraction of Terms, “Named” Relations Pred-Arg & Head-Mod Structure, Term Level TextToOnto OntoLearn Text in Ontology Engineering  Textual Grounding of Concepts Retain Linguistic Contexts and Realizations  Text-based Ontology Monitoring Compare Language Use over Time

17 © Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 OntoLT: Some Background Ontology Learning from Text  Taxonomy Extraction, Document Clustering String-based, Document Level  “Unnamed” Relation Extraction, Word Clustering Stemming & Part-of-Speech, Token Level  Extraction of Terms, “Named” Relations Pred-Arg & Head-Mod Structure, Term Level Text in Ontology Engineering  Textual Grounding of Concepts Retain Linguistic Contexts and Realizations  Text-based Ontology Monitoring Compare Language Use over Time OntoLT

18 © Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 OntoLT What is it? OntoLT provides a middleware solution in ontology development that enables the ontology engineer to bootstrap or extend a domain- specific ontology from a relevant text collection How does it work? 1. automatic linguistic annotation 2. automatic statistical preprocessing 3. interactive definition of mapping rules 4. interactive user validation of candidates 5. automatic integration into an ontology

19 © Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 OntoLT: Architecture

20 © Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 … … … … … mittler patellar Sehne Drittel … Linguistic Annotation … … mittlere Patellarsehnendrittel (mid patellar ligament third) An 40 Kniegelenkpräparaten wurden mittlere Patellarsehnendrittel mit einer neuen Knochenverblockungstechnik in einem zweistufigen Bohrkanal femoral fixiert.

21 © Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Mapping Rules Precondition Language Var (Y, XPath (Y)) Get all occurrences of element Y, e.g. HeadNoun, Modifier, Subject, … Concat ConcatList combined through AND, OR, NOT, EQUAL Operators CreateCls create a new class with super-class AddSlot add a slot with range to a new or existing class CreateInst introduce an instance for a new or existing class FillSlot set the value of a slot of an instance

22 © Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Mapping Rules Precondition Language Var (Y, XPath (Y)) Get all occurrences of element Y, e.g. HeadNoun, Modifier, Subject, … Concat ConcatList combined through AND, OR, NOT, EQUAL Operators CreateCls create a new class with super-class AddSlot add a slot with range to a new or existing class CreateInst introduce an instance for a new or existing class FillSlot set the value of a slot of an instance

23 © Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Example Experiment Ontology Extraction for Neurology  Neurology Section of a Medical Corpus Medical Scientific Journal Abstracts – MuchMore Project  XML-based Linguistic Annotation PoS, Lemmatization, Phrases, Pred-Arg Structure  Statistical Preprocessing (chi-square) Select Domain-Relevant Linguistic Entities  Definition of Mapping Rules Define Operators for Selected Linguistic Entities  Generate & Validate Class/Slot Candidates Select Candidates for Integration in Neurology Ontology  Generate “Ontology Fragments” for Neurology

24 © Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004

25

26

27

28

29

30

31

32

33

34

35

36 Further Issues Future Development  Organization of Class/Slot Candidate List Inference & Clustering - “Graph Restructuring”  Extend Statistical Preprocessing Multiple Reference Corpora Extended Frequency Information  Include Machine Learning Approach Semi-Automatic Definition of Mapping Rules Performance Evaluation  Guidelines ECAI04 Workshop on OLP  Benchmark Challenge within PASCAL NoE

37 © Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Evaluation: What? -- Subtasks  Classes (Multilingual) Term Extraction Named-Entity Recognition Similarity Thesaurus Term,Document Clustering  Class-Hierarchy (Taxonomy) Thesaurus Extraction Term,Document Clustering  Class-Properties (Relations) Relation Extraction ? Formal Properties of Relations (Properties)  Class-Instances (Individuals) (Multilingual) Term Extraction Named-Entity Recognition Term,Document Classification

38 © Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Evaluation: How? By Sub-Task – Evaluation of:  Classes – Term,NE Extraction,Clustering  Class-Hierarchy – Thesaurus Extraction  Class-Properties – Relation Extraction  Class-Instances – Term,NE Extraction,Classification By Application – Evaluation of:  Ontology Learning and Population – Gold Standard  IR,QA – Precision /Recall Increase with Ontology?  Interactive QA – Increased User Satisfaction?  Information Access – Increased User Performance?

39 © Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Conclusions Stay Tuned  OntoLT Release To be Announced on Protégé-Discussion List http://protege.stanford.edu/mailing-lists  Evaluation Ontology Learning & Population (OLP) Challenge Within PASCAL NoE - First Task Spring 2005 ECAI04 Workshop: Evaluation of Text-based OLP http://olp.dfki.de/ECAI04/cfp.htm


Download ppt "© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Human Language Technology in Ontology Engineering Ontology Learning from Text Paul Buitelaar."

Similar presentations


Ads by Google