Presentation is loading. Please wait.

Presentation is loading. Please wait.

Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series.

Similar presentations


Presentation on theme: "Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series."— Presentation transcript:

1 Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

2 Objectives Developing the specifications for a widely usable software to help in definition writing  Creating language- and domain-independent definition models (templates)  Using the BFO categories and relations  Creating BFO-related linguistic/NLP resources for large-scale corpus analyses of definitions 2NCBO Webinar Series | October 29, 2014 | Selja Seppälä

3 The general idea GEN is_a OBJECT An achromatic cell SPE other: develops_from MATERIAL ENTITY of the myeloid or lymphoid lineages SPE bearer_of DISPOSITION capable of ameboid movement, SPE located_in MATERIAL ENTITY found in blood or other tissue. OBJECT NCBO Webinar Series | October 29, 2014 | Selja Seppälä OBJECT s has_part OBJECT s has_part OBJECT AGGREGATE MATERIAL ENTITY s bearer_of DISPOSITION s bearer_of QUALITY s contains PROCESS s contains PROCESS BOUNDARY s has_history PROCESS a has_part IMMATERIAL ENTITY a has_part MATERIAL ENTITY a located_in INDEPENDENT CONTINUANT s material_basis_of DISPOSITION a occupies THREE - DIMENSIONAL SPATIAL REGION a part_of IMMATERIAL ENTITY a part_of MATERIAL ENTITY s participates_in PROCESS 3 leukocyte ( CL_0000738 ) An achromatic cell of the myeloid or lymphoid lineages capable of ameboid movement, found in blood or other tissue. 1 2 3

4 Outline Background on definitions Introducing the problem Proposal: modeling definition contents with BFO Methodology Creating BFO-related linguistic/NLP resources Conclusion 4NCBO Webinar Series | October 29, 2014 | Selja Seppälä

5 BACKGROUND ON DEFINITIONS NCBO Webinar Series | October 29, 2014 | Selja Seppälä5

6 Object of study (1) NCBO Webinar Series | October 29, 2014 | Selja Seppälä6

7 Object of study (2) NCBO Webinar Series | October 29, 2014 | Selja Seppälä7

8 Object of study (3) NCBO Webinar Series | October 29, 2014 | Selja Seppälä8

9 Object of study (4) NCBO Webinar Series | October 29, 2014 | Selja Seppälä9

10 Functions of a definition Conveying meaning of domain-specific terms Specifying terms’ intended meaning  Disambiguation function Problems when – Information is not relevant, e.g. non-defining – No definition at all  How to understand/use the term? 10NCBO Webinar Series | October 29, 2014 | Selja Seppälä

11 Definition of ‘definition’ A single sentence Composed of pieces of information, e.g. found in experts’ texts Which are relevant to the experts’ activity About the types of things to which the defined term refers Has classical Aristotelian structure – One genus – One or more differentiae (specifiers) NCBO Webinar Series | October 29, 2014 | Selja Seppälä11

12 Example from the Uber Anatomy Ontology (UBERON) ligament Dense regular connective tissue connecting two or more adjacent skeletal elements or supporting an organ. NCBO Webinar Series | October 29, 2014 | Selja Seppälä genus: type of thing differentia: distinguishing characteristic 12

13 Example from the Cell Type Ontology (CL) leukocyte An achromatic cell of the myeloid or lymphoid lineages capable of ameboid movement, found in blood or other tissue. NCBO Webinar Series | October 29, 2014 | Selja Seppälä genus: type of thing differentiae: distinguishing characteristics 13

14 Definition writing rules, issues, and needs Subject to a few formal rules (punctuation, etc.) Only vague principles for selecting the right pieces of information Time-consuming, costly, and prone to a kinds of inconsistencies ➔ Create computer software to help authors – Improve quality of dictionaries & ontologies – Comply to the publisher’s formal rules (Seppälä 2006) – Include relevant defining contents NCBO Webinar Series | October 29, 2014 | Selja Seppälä14

15 Definition writing rules, issues, and needs Subject to a few formal rules (punctuation, etc.) Only vague principles for selecting the right pieces of information Time-consuming, costly, and prone to a kinds of inconsistencies ➔ Create computer software to help authors – Improve quality of dictionaries & ontologies – Comply to the publisher’s formal rules (Seppälä 2006) – Include relevant defining contents NCBO Webinar Series | October 29, 2014 | Selja Seppälä15

16 INTRODUCING THE PROBLEM NCBO Webinar Series | October 29, 2014 | Selja Seppälä16

17 Example: compiling a corpus 5.3.2 Net booms Netting can be used as a boom to collect viscous oils at sea. Vessels deploy the netting, manoeuvring it to surround and concentrate the oil for recovery. In inshore waters, it may be possible to use nets as a moored boom. Net booming is based on the principle that air and water will pass through the net but not viscous oil. Therefore, there is little or no downward current to carry oil underneath the net and the net is less prone to being blown flat. Because of the lower resistance of nets to water movement, it should also be possible in theory to deploy net booms in faster currents than is possible with conventional booms. Net booms have not yet been fully tested in an actual oil spill but results from field trials have been encouraging. A net boom consists of a long strip of netting, supported at frequent and regular intervals by poles with floats and weights attached to keep the netting upright (Figure 20). (Source: http://www.dft.gov.uk/mca/chapter_5.pdf) 17NCBO Webinar Series | October 29, 2014 | Selja Seppälä

18 Example: extracting information 5.3.2 Net booms Netting can be used as a boom to collect viscous oils at sea. Vessels deploy the netting, manoeuvring it to surround and concentrate the oil for recovery. In inshore waters, it may be possible to use nets as a moored boom. Net booming is based on the principle that air and water will pass through the net but not viscous oil. Therefore, there is little or no downward current to carry oil underneath the net and the net is less prone to being blown flat. Because of the lower resistance of nets to water movement, it should also be possible in theory to deploy net booms in faster currents than is possible with conventional booms. Net booms have not yet been fully tested in an actual oil spill but results from field trials have been encouraging. A net boom consists of a long strip of netting, supported at frequent and regular intervals by poles with floats and weights attached to keep the netting upright (Figure 20). (Source: http://www.dft.gov.uk/mca/chapter_5.pdf) 18NCBO Webinar Series | October 29, 2014 | Selja Seppälä

19 Example: identifying potentially defining information 5.3.2 Net booms Netting can be used as a boom to collect viscous oils at sea. Vessels deploy the netting, manoeuvring it to surround and concentrate the oil for recovery. In inshore waters, it may be possible to use nets as a moored boom. Net booming is based on the principle that air and water will pass through the net but not viscous oil. Therefore, there is little or no downward current to carry oil underneath the net and the net is less prone to being blown flat. Because of the lower resistance of nets to water movement, it should also be possible in theory to deploy net booms in faster currents than is possible with conventional booms. Net booms have not yet been fully tested in an actual oil spill but results from field trials have been encouraging. A net boom consists of a long strip of netting, supported at frequent and regular intervals by poles with floats and weights attached to keep the netting upright (Figure 20). (Source: http://www.dft.gov.uk/mca/chapter_5.pdf) 19NCBO Webinar Series | October 29, 2014 | Selja Seppälä

20 Example: selecting relevant information for defining the term 5.3.2 Net booms Netting can be used as a boom to collect viscous oils at sea. Vessels deploy the netting, manoeuvring it to surround and concentrate the oil for recovery. In inshore waters, it may be possible to use nets as a moored boom. Net booming is based on the principle that air and water will pass through the net but not viscous oil. Therefore, there is little or no downward current to carry oil underneath the net and the net is less prone to being blown flat. Because of the lower resistance of nets to water movement, it should also be possible in theory to deploy net booms in faster currents than is possible with conventional booms. Net booms have not yet been fully tested in an actual oil spill but results from field trials have been encouraging. A net boom consists of a long strip of netting, supported at frequent and regular intervals by poles with floats and weights attached to keep the netting upright (Figure 20). (Source: http://www.dft.gov.uk/mca/chapter_5.pdf) 20NCBO Webinar Series | October 29, 2014 | Selja Seppälä

21 Example: composing the definition 5.3.2 Net booms Netting can be used as a boom to collect viscous oils at sea. Vessels deploy the netting, manoeuvring it to surround and concentrate the oil for recovery. In inshore waters, it may be possible to use nets as a moored boom. Net booming is based on the principle that air and water will pass through the net but not viscous oil. Therefore, there is little or no downward current to carry oil underneath the net and the net is less prone to being blown flat. Because of the lower resistance of nets to water movement, it should also be possible in theory to deploy net booms in faster currents than is possible with conventional booms. Net booms have not yet been fully tested in an actual oil spill but results from field trials have been encouraging. A net boom consists of a long strip of netting, supported at frequent and regular intervals by poles with floats and weights attached to keep the netting upright (Figure 20). (Source: http://www.dft.gov.uk/mca/chapter_5.pdf) 21NCBO Webinar Series | October 29, 2014 | Selja Seppälä

22 A content selection problem How to select the relevant kind of information? ➔ Selection principles How to create rules to help authors select that the relevant information? ➔ Content modeling NCBO Webinar Series | October 29, 2014 | Selja Seppälä22

23 PROPOSAL: MODELING DEFINITION CONTENTS WITH BFO NCBO Webinar Series | October 29, 2014 | Selja Seppälä23

24 Content selection principles Extensional dimension – Ontological factors Category of the referent ( OBJECT, PROCESS, QUALITY, etc.) Type of things or particular (particle accelerator vs. the LHC) Contextual dimension – Conceptual system of the domain – Background knowledge of the user Communicative dimension – User needs – Communication situation (linguistic expression: quadruped vs. four legged) NCBO Webinar Series | October 29, 2014 | Selja Seppälä24

25 A generic solution to content selection Extensional dimension – Ontological factors Category of the referent ( OBJECT, PROCESS, QUALITY, etc.) Type of things or particular (particle accelerator vs. the LHC) Contextual dimension – Conceptual system of the domain – Background knowledge of the user Communicative dimension – User needs – Communication situation (linguistic expression: quadruped vs. four legged) NCBO Webinar Series | October 29, 2014 | Selja Seppälä25

26 Solution to content representation ➔ Represent contents with models based on categories and relations (Sager 1990, Sager & L’Homme 1994) NCBO Webinar Series | October 29, 2014 | Selja Seppälä leukocyte An achromatic cell of the myeloid or lymphoid lineages capable of ameboid movement, found in blood or other tissue. is_a OBJECT capable_of PROCESS develops_from MATERIAL ENTITY located_in MATERIAL ENTITY 26

27 Questions What kinds of entities are there? What are their characteristics? ➔ Use a realist upper level ontology: the Basic Formal Ontology (BFO) NCBO Webinar Series | October 29, 2014 | Selja Seppälä27

28 Proposal Create definition models based on the BFO categories and their characteristics See to what extent these models predict the contents of definitions ➔ Ontological analysis framework NCBO Webinar Series | October 29, 2014 | Selja Seppälä28

29 METHODOLOGY NCBO Webinar Series | October 29, 2014 | Selja Seppälä29

30 Example: modeling definitions of OBJECTS ligament (UBERON_0000211) GEN is_a OBJECT Dense regular connective tissue SPE bearer_of FUNCTION connecting two or more adjacent skeletal elements or supporting an organ. leukocyte ( CL_0000738 ) GEN is_a OBJECT An achromatic cell SPE other: develops_from MATERIAL ENTITY of the myeloid or lymphoid lineages SPE bearer_of DISPOSITION capable of ameboid movement, SPE located_in MATERIAL ENTITY found in blood or other tissue. OBJECT NCBO Webinar Series | October 29, 2014 | Selja Seppälä OBJECT s has_part OBJECT s has_part OBJECT AGGREGATE MATERIAL ENTITY s bearer_of DISPOSITION s bearer_of QUALITY s contains PROCESS s contains PROCESS BOUNDARY s has_history PROCESS a has_part IMMATERIAL ENTITY a has_part MATERIAL ENTITY a located_in INDEPENDENT CONTINUANT s material_basis_of DISPOSITION a occupies THREE - DIMENSIONAL SPATIAL REGION a part_of IMMATERIAL ENTITY a part_of MATERIAL ENTITY s participates_in PROCESS 30

31 Creating definition models Use BFO specifications, OWL file & FOL axioms To analyze BFO categories as relational configurations (RCs) ‘ CATEGORY + relation + CATEGORY ’ E.g.: some OBJECT has_part OBJECT some OBJECT participates_in PROCESS NCBO Webinar Series | October 29, 2014 | Selja Seppälä31

32 MATERIAL ENTITIES in BFO 2.0 NCBO Webinar Series | October 29, 2014 | Selja Seppälä OBJECT AGGREGATE a has_member_part OBJECT FIAT OBJECT PART a part_of OBJECT MATERIAL ENTITY s bearer_of DISPOSITION s bearer_of QUALITY s contains PROCESS s contains PROCESS BOUNDARY s has_history PROCESS a has_part IMMATERIAL ENTITY a has_part MATERIAL ENTITY a located_in INDEPENDENT CONTINUANT s material_basis_of DISPOSITION a occupies THREE - DIMENSIONAL SPATIAL REGION a part_of IMMATERIAL ENTITY a part_of MATERIAL ENTITY s participates_in PROCESS OBJECT s has_part OBJECT s has_part OBJECT AGGREGATE 32 is_a

33 OBJECT s has_part OBJECT s has_part OBJECT AGGREGATE MATERIAL ENTITY s bearer_of DISPOSITION s bearer_of QUALITY s contains PROCESS s contains PROCESS BOUNDARY s has_history PROCESS a has_part IMMATERIAL ENTITY a has_part MATERIAL ENTITY a located_in INDEPENDENT CONTINUANT s material_basis_of DISPOSITION a occupies THREE - DIMENSIONAL SPATIAL REGION a part_of IMMATERIAL ENTITY a part_of MATERIAL ENTITY s participates_in PROCESS The model for OBJECT Creating relational models with proper and inherited RCs Relations characterizing the category OBJECT Relations inherited from the category MATERIAL ENTITY OBJECT NCBO Webinar Series | October 29, 2014 | Selja Seppälä33

34 Testing the models Applying the models to multilingual & multi-domain textual definitions corpora Segmenting: GEN + SPE Annotating: with relational models Statistical analyses to see – Which relations are more relevant for defining – If new ones can be added to the models  Relevant models from generic ones NCBO Webinar Series | October 29, 2014 | Selja Seppälä34

35 Annotation example (1) EN_AB_11 protein biosynthesis in eukaryotic cells cell protein protein A molecule consisting of different amino acids linked to each other in a specific sequence by peptide bonds. NCBO Webinar Series | October 29, 2014 | Selja Seppälä35

36 Annotation example (2)... protein A molecule consisting of different amino acids linked to each other in a specific sequence by peptide bonds.... NCBO Webinar Series | October 29, 2014 | Selja Seppälä36

37 Results grid for OBJECT (BFO 1.0) NCBO Webinar Series | October 29, 2014 | Selja Seppälä OBJECT normalcateg.othertotal% has_part OBJECT 15116 26 participates_in PROCESS 369 bearer_of QUALITY 19322 60 bearer_of REALIZABLE ENTITY 31132 located_at TEMPORAL REGION 000 located_in SITE 224 other14 TOTAL 70131497100 37

38 CREATING BFO-RELATED LINGUISTIC/NLP RESOURCES NCBO Webinar Series | October 29, 2014 | Selja Seppälä38

39 Creating input resources Corpora of existing definitions – Multi-domain & multilingual – Checked for adequcy of form (spelling, punctuation, etc.) – May be syntactically parsed BFO-Definition-Models – XML format – Encoded: full/short name for relations, synonyms, sources, purl, comments, modality (all, some, no) Programs for automating – Genus segmentation & tagging – Genus annotation with BFO category  More systematic & increased reliability  Pre-annotation of large-scale corpora NCBO Webinar Series | October 29, 2014 | Selja Seppälä39

40 Genus tagging: segmenting A.Create processing resources 1.Morpho-sytactic rules to get SPE preceding the GEN 2.List of terms from dictionary or ontology 3.Morpho-syntactic rules to find end of GEN B.For each definition 1.Apply rules in A.1 to tag beginning of definitions with a premodifying tag 2.Search for terms at the beginning of definition or after the first tag using list in A.2 3.If term found, tag with tag 4.Else, apply rules in A.3 to tag end of GEN that does not contain a term NCBO Webinar Series | October 29, 2014 | Selja Seppälä40

41 Segmenting examples ligament D ense regular connective tissue connecting two or more adjacent skeletal elements or supporting an organ. NCBO Webinar Series | October 29, 2014 | Selja Seppälä41 morpho-syntactic rule:.*ing

42 Genus tagging: annotating No existing mappings of linguistic expressions to BFO categories Tests to create such reseources using A.Terms in BFO-compliant ontologies (e.g., mid-level) B.WordNet-KYOTO-DOLCE-Lite-BFO mappings Processing – Get single/complex terms – Get headword of complex terms – Create synsets of single & complex terms linked to headwords A.Map to BFO categories via terms from BFO-compliant ontologies B.Map single terms/headwords to WN synsets  search synsets’ ids in KYOTO  get DOLCE categories  map to BFO categories NCBO Webinar Series | October 29, 2014 | Selja Seppälä42

43 CONCLUSION NCBO Webinar Series | October 29, 2014 | Selja Seppälä43

44 Output resources Bilingual multi-domain training corpora (comparable & parallel corpora) From corpora – Examples for BFO categories – Lexicons mapped to BFO categories & relations Annotation manual Annotation and validation interfaces NCBO Webinar Series | October 29, 2014 | Selja Seppälä44

45 Future work Conduct large-scale corpus studies ➔ Annotation campaign ➔ Larger training corpora for machine learning Possible use of the produced linguistic resources to automatically generate textual definitions from logical ones, and vice versa NCBO Webinar Series | October 29, 2014 | Selja Seppälä45

46 THANK YOU seljamar@buffalo.edu http://seljaseppala.wordpress.com 46NCBO Webinar Series | October 29, 2014 | Selja Seppälä

47 References Juan Sager. 1990. A practical course in terminology processing. John Benjamins, Amsterdam, Philadelphia. Juan Sager and Marie-Claude L’Homme. 1994. A model for the definition of concepts: Rules for analytical definitions in terminological databases. Terminology, 1(2):351–373. Selja Seppälä. 2006. Semi-automatic checking of terminographic definitions. In International Workshop on Termi- nology design: quality criteria and evaluation methods (TermEval) — LREC 2006, 22–27, Genoa, Italy, 2006. Selja Seppälä. 2010. Automatiser la rédaction de définitions terminographiques : questions et traitements. In RECITAL 2010, Montréal, Canada, 19–22 juillet. Selja Seppälä. 2012. Contraintes sur la sélection des informations dans les définitions terminographiques: vers des modèles relationnels génériques pertinents. PhD thesis, Département de traitement informatique multilingue (TIM), Faculté de traduction et d’interprétation, Université de Genève. NCBO Webinar Series | October 29, 2014 | Selja Seppälä47

48 Outcomes Pilot study suggests that BFO-based models are predictive of definitions’ contents When not predictive, BFO categories and relations can be used as a fixed metalanguage Methodology may reveal ‘typical features’ for defining each category  weighted differentia Creation of lingustic resources for a better integraton of BFO to NLP processing NCBO Webinar Series | October 29, 2014 | Selja Seppälä48

49 Advantages (Semi-)formal structuring of textual definitions  Homogeneity  Internal structure linked to the ontology No complex BFO labels displayed (only in metadata)  Increased readability & user-friendliness Allows more complex querying via the definition (through metadata AND text) Domain-adaptable through corpus analysis Methodology applicable to – More specific categories from ontologies extending BFO – Other upper-level ontologies NCBO Webinar Series | October 29, 2014 | Selja Seppälä49


Download ppt "Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series."

Similar presentations


Ads by Google