Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Lexical Semantics and Ontologies Tutorial at.

Similar presentations


Presentation on theme: "© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Lexical Semantics and Ontologies Tutorial at."— Presentation transcript:

1 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Lexical Semantics and Ontologies Tutorial at the ACL/HCSnet 2006 Advanced Program in Natural Language Processing Paul Buitelaar Language Technology Lab & Competence Center Semantic Web DFKI GmbH Saarbrücken, Germany

2 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Overview Day 1: Words and Meanings  Human language as a system  How do words relate to each other Day 2: Words and Object Descriptions  Human language as a means of representation  How do words represent objects in the/a world

3 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Day 1 - Introduction Words and Meanings  Synsets and Senses Lexical Semantics in WordNet  Related Senses Generative Lexicon and CoreLex  Domains and Senses Tuning WordNet to a Domain

4 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Words and Meanings Lexical Semantics in WordNet Generative Lexicon and CoreLex Tuning WordNet to a Domain

5 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Lexical Semantic Resource  Semantic Lexicon Maps words to meanings (senses)  Lexical Database Machine readable (has a formal structure) Freely available  WordNet

6 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia In 1985 a group of psychologists and linguists at Princeton University undertook to develop a lexical database … The initial idea was to provide an aid to use in searching dictionaries conceptually, rather than merely alphabetically … WordNet … instantiates hypotheses based on results of psycholinguistic research … … expose such hypotheses to the full range of the common vocabulary In anomic aphasia, there is a specific inability to name objects. When confronted with an apple, say, patients may be unable to utter ‘‘apple,’’ even though they will reject such suggestions as shoe or banana, and will recognize that apple is correct when it is provided. (Caramazza/Berndt 1978) Miller, George A., Richard Beckwith, Christiane Fellbaum, Derek Gross and Katherine J. Miller. ``Introduction to WordNet: an on-line lexical database.'' In: International Journal of Lexicography 3 (4), 1990, pp WordNet - Origins

7 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia WordNet is organized around word meaning (not word forms as with traditional lexicons)  Word meaning is represented by “synsets”  Synset is a “Set of Synonyms” Example  {board, plank} Piece of lumber  {board, committee} Group of people Synsets

8 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Synsets are organized in hierarchies  Defines: generalization (hypernymy) specialization (hyponymy) Example {entity} … {whole, unit} {building material} {lumber, timber} {board, plank} Synset Hierarchy hyponymyhypernymy

9 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Hierarchies (WordNet 1.7)

10 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Hierarchy Example (WordNet 2.1)

11 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Synsets and Senses Synsets represent word meaning  Words that occur in several synsets have a corresponding number of meanings (senses) Example

12 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia WordNet 2.1

13 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Synonymy  Similar in meaning Hypernymy/Hyponymy  Generalization and Specialization Meronymy  Part-of e.g. study, bathroom,... meronym house Antonymy  Opposite in meaning e.g. warm antonym cold (Other) WordNet Relations

14 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Words and Meanings Lexical Semantics in WordNet Generative Lexicon and CoreLex Tuning WordNet to a Domain

15 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Homonymy  bank embankmentWe walked along the bank of the Charles river. institutionDid he have an account at the HBU bank? Systematic Polysemy  school group (of people)The school went for an outing. (learning) processSchool starts at 8.30 organizationThe school was founded in buildingThe school has a new roof. Systematic Polysemy

16 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Obj1Obj4 Obj2 Obj3 Semantic Analysis Pragmatic Analysis Lexical Items of the Language Objects in the World school Obj1 Obj2 Obj3 Obj4 Semantic or Pragmatic?

17 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Anaphora Resolution  [A long book heavily weighted with military technicalities] NP:event-physical_object- content, in this edition it is neither so long event nor so technical content as it was originally. Metonymy  The Boston office called office > person person part-of office Bridging  Peter bought a car. The engine runs well. engine part-of car  The Boston office called. They asked for a new price. office > person Underspecified Discourse Referents

18 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Type Coercion I began the book book > event event ‘has-relation-with’ book read is-a event  multifaceted representation of lexical semantics  reflecting systematic / regular / logical polysemy Generative Lexicon Theory

19 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Qualia Structure (Pustejovsky 1995) Formalinheritance (is-a / hyponymy) book formal artifact, communication, … Constitutivemodification (part-of / meronymy) book constitutive section, … Telicpurpose („what is the object used for“) book telic read, … Agentivecausality („how did the object come about“) book agentive write, … Generative Lexicon Theory

20 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Automatic Qualia Structure Acquisition  CoreLex is an attempt to automatically acquire underspecified lexical semantic representations that reflect systematic polysemy  These representations can be viewed as shallow Qualia Structures Sense Distribution in WordNet  Systematic polysemy can be empirically studied in WordNet by observing sense distributions >> If more than two words share the same sense distribution (i.e. have the same set of senses), then this may indicate a pattern of systematic polysemy (adapted from Apresjan 1973) CoreLex (Buitelaar 1998)

21 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia book1.{publication}=> artifact 2.{product, production}=> artifact 3.{fact}=> communication 4.{dramatic_composition, dramatic_work}=> communication 5.{record}=> communication 6.{section, subdivision}=> communication 7.{journal}=> artifact Systematic Polysemous Class “artifact communication” amulet annals armband arrow article ballad bauble beacon bible birdcall blank blinker boilerplate book bunk cachet canto catalog catalogue chart chevron clout compact compendium convertible copperplate copy cordon corker... guillotine homophony horoscope indicator journal laurels lay ledger loophole marker memorial nonsense novel obbligato obelisk obligato overture pamphlet pastoral paternoster pedal pennant phrase platform portrait prescription print puzzle radiogram rasp recap riddle rondeau … statement stave stripe talisman taw text tocsin token transcription trophy trumpery wand well whistle wire wrapper yardstick Systematic Polysemous Classes

22 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Noun 1 Noun n Basic Type 1 Systematic Polysemous Class 1 Systematic Polysemous Class n From WordNet to CoreLex

23 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia “animal natural_object” alligator broadtail chamois ermine lapin leopard muskrat... “natural_object plant” algarroba almond anise baneberry butternut candlenut cardamon... “action artifact group_social” artillery assembly band church concourse dance gathering institution... “action attribute event psychological” appearance concentration decision deviation difference impulse outrage … “possession quantity_definite” cent centime dividend gross penny real shilling Other Examples

24 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia CoreLex vs. WordNet

25 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Representation and Interpretation „Dotted Types“ (Pustejovsky)  Lexical types are either simple ( human, artifact,...) or complex ( information AND physical_object )  Can be represented with a „dotted type“, e.g. information  physical_object  In (Cooper 2005) interpreted as a record type (a delicious lunch can take forever):

26 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Related Work Apresjan 1973  Regular Polysemy. Nunberg & Zaenen 1992  Systematic polysemy in lexicology and lexicography. Bill Dolan 1994  Word Sense Ambiguation: Clustering Related Senses. Copestake & Briscoe 1996  Semi-productive polysemy and sense extension. Peters, Peters & Vossen 1998  Automatic Sense Clustering in EuroWordNet. Tomuro 1998  Semi-Automatic Induction of Systematic Polysemy from WordNet.

27 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Words and Meanings Lexical Semantics in WordNet Generative Lexicon and CoreLex Tuning WordNet to a Domain

28 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Reducing Ambiguity WordNet has too many senses … Reduce Ambiguity  Cluster related senses (CoreLex)  Tune WordNet to an application domain

29 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Domains and Senses Domains determine Sense Selection, e.g.  English: cell prison cell in the Politics/Law domain living cell in the Biomedical domain  English: tissue living tissue in the Biomedical domain cloth in the Fashion domain  German: Probe test in the Biomedical domain rehearsal in the Theater domain >> Compute Domain-Specific Sense

30 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Approaches Subject Codes  Domain codes are in the dictionary Topic Signatures  Compute (domain-specific) context models from dictionary definitions, domain corpora, web resources Tuning of WordNet to a domain  Top Down: Cucchiarelli & Velardi, 1998  Bottom Up: Buitelaar & Sacaleanu, 2001  Related recent work: McCarthy et al, 2004; Chan & Ng, 2005; Mohammad & Hirst, 2006

31 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Subject Codes Subject Codes (as used in LDOCE) indicate a domain in which a word is used in a particular sense Examples (2600 codes)  Sub-Field Codes MDZP (Medicine:Physiology)  Code Combinations MLCO (Meteorology+Building) e.g. lightning conductor MLUF (Meteorology+Europe+France) e.g. Mistral high SN (sounds) DG (drugs) ML (meteorology)

32 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Adding Subject Codes to WordNet Grouping Synsets together across POS MEDICINENouns:doctor#1, hospital#1 Verbs:operate#7 Grouping Synsets together across Sub-Hierarchies SPORTlife_form#1: athlete#1 physical_object#1: game_equipment#1 act#2 : sport#1 location#1 : playing_field#1 Magnini B. & Cavaglià G. Integrating Subject Field Codes into WordNet In: Proceedings LREC 2000

33 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia WordNet DOMAINS SenseWordNet synset and glossDomains 1Depository, financial institution, bank, banking concern, banking company (a financial institution)Economy 2Bank (sloping land)Geography, Geology 3Bank (a supply or stock held in reserve)Economy 4Bank, bank building (a building)Architecture, Economy 5Bank (an arrangement of similar objects)Factotum 6Savings bank, coin bank, money box, bank (a container)Economy 7Bank (a long ridge or pile)Geography, Geology 8Bank (the funds held by a gambling house )Economy, Play 9Bank, cant, camber (a slope in the turn of a road)Architecture 10Bank (a flight maneuver.)Transport Bernardo Magnini, Carlo Strapparava, Giovanni Pezzuli, and Alfio Gliozzo. Using domain information for word sense disambiguation. In: Proceedings of the SENSEVAL2 workshop 2001.

34 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia WSD with Subject Codes Match between set of words in the context of the ambiguous word and the set of words (“neighborhoods”) in the definitions + sample sentences of all senses that share a Subject Code Guthrie J. A. & Guthrie I. & Wilks Y. & Aidinejad H. Subject Dependent Co-Occurrence and Word Sense Disambiguation In: Proceedings of ACL writesafesum accountpersonput takemoneyorder keeppaysupply paperdrawcheque bank: Economics medicineproducthold originplacehuman treatmentbloodhospital usestore organcomb bank: Medicine and Biology

35 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Topic Signatures from the Web Construct Topic Signatures for WordNet synsets/senses  Retrieve document collections from the web and use queries constructed for each WordNet sense, e.g. Agirre E. & Ansa O. & Hovy E. & Martinez D. Enriching very large ontologies using the WWW In: Proc. of the Ontology Learning Workshop ECAI 2000 ( boyAND ( altar boy OR ball boy OR … OR male person ) AND NOT (man OR … OR broth of a boy OR son OR … OR mama’s boy OR black ) )

36 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Top Down Tuning – Cucchiarelli & Velardi Automatically find the best set of (WordNet) senses that:  “… represent at best the semantics of the domain”  “[has the] … ‘right’ level of abstraction, so as to mediate between over-ambiguity and generality”  “… [is] balanced …, i.e. words should be evenly distributed among categories” Alessandro Cucchiarelli, Paola Velardi Finding a domain-appropriate sense inventory for semantically tagging a corpus. Natural Language Engineering 4/4, p , Dec

37 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Methods Used Create alternative sets of balanced categories by use of an adapted version of the Hearst/Schütze algorithm Apply a scoring function to find the best set, with parameters:  Generality Highest possible level of generalization with a small number of categories is preferred  Discrimination Power Different senses lead to different categories  (Domain) Coverage Words in the domain corpus that are represented by the selected categories  Average Ambiguity Ambiguity reduction is measured by the inverse of the average ambiguity of all words

38 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Balanced Categories - Hearst/Schütze Reduce WordNet noun hierarchy to a set of 726 disjoint categories, each consisting of a relatively large number of synsets and of an average size, with as small a variance as possible Group categories together into a set of 106 super-categories according to mutual co-occurrence in a training corpus Measure the frequency of categories on domain corpora Hearst M. & Schütze H. Customizing a Lexicon to Better Suit a Computational Task In: Proceedings ACL SIGLEX Workshop legal_system, government, politics,... United States Constitution religion, breads, mythology,... Genesis

39 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Generality Generality of Category Set C i : 1/DM(C i ) Average Distance between the Categories of C i and the topmost synsets. Topmost SynSet General SynSet / 2 3 / 1 Ci1Ci1 Ci2Ci2 C i = {C i1, C i2 } DM (C i )= ( ) / 2 = 3.25

40 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Discrimination Power Discrimination Power of Category Set C i : (N c (C i ) - N pc (C i ))/ N c (C i ) where N c (C i ) is the number of words that reach at least one category of C i and N pc (C i ) is the number of words that have at least two senses that reach the same category c ij of C i Ci1Ci1 C i = {C i1 C i2 C i3 C i4 } w1w1 Ci2Ci2 w2w2 Ci3Ci3 w3w3 Ci4Ci4 General Synset Sense Domain Word

41 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Coverage & Average Ambiguity Coverage of Category Set C i : N c (C i )/W where N c (C i ) is the number of words that reach at least one category in C i Inverse of Average Ambiguity of Category Set C i : 1/A(C i ) where N c (C i ) is the number of words that reach at least one category in C i, and for each word w in this set, Cw j (C i ) is the number of categories in C i reached

42 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Best Category Set (WSJ) CategoryHigher-level synset C1C1 person, individual, someone, mortal, human, soul C2C2 instrumentality, instrumentation C3C3 written communication, written language C4C4 message, content, subject matter, substance C5C5 measure, quantity, amount, quantum C6C6 action C7C7 activity C8C8 group action C9C9 organization C 10 psychological­ feature C 11 possession C 12 state C 13 location Top Down categories for the financial domain, based on the Wall Street Journal

43 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Sense Selection with WSJ Set SenseSynset hierarchy for senseTop synset for sense 1 capital > assetpossession (C 11 ) 2 support > deviceinstrumentality (C 2 ) 4 document > writingwritten communication (C 3 ) 5 accumulation > assetpossession (C 11 ) 6 ancestor > relativeperson (C 1 ) SenseSynset hierarchy for sense 3stock, inventory > merchandise, wares >… 7broth, stock > soup > … 8stock, caudex > stalk, stem > … 9stock > plant part > … 10stock, gillyflower > flower > … 11malcolm stock, stock > flower … 12lineage, line of descent > … > genealogy > … 14lumber, timber > … Senses for stock - kept by domain tuning on the Wall Street Journal Senses for stock - discarded by domain tuning on the Wall Street Journal

44 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Bottom Up Tuning – Buitelaar & Sacaleanu Ranking of WordNet synsets according to a domain-specific corpus  Compute term relevance against reference corpus  Compute synset relevance according to term relevance (where term = synonym in synset)  Ranking can be used in WSD (similar to usage of ‘most frequent heuristic’) Paul Buitelaar, Bogdan Sacaleanu Ranking and Selecting Synsets by Domain Relevance In: Proceedings of WordNet and Other Lexical Resources: Applications, Extensions and Customizations, NAACL 2001 Workshop, June 3/4 2001

45 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia TFIDF tf(w)term frequency (number of word occurrences in a document) df(w)document frequency (number of documents containing the word) Nnumber of all documents tfIdf(w)relative importance of the word in the document The word is more important if it appears several times in a target document The word is more important if it appears in less documents

46 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Term and Synset Relevance Term Relevance  Relevance Score of Synset Members where t represents the term, d the domain, N is the total number of domains Synset Relevance  Cumulated Relevance Score for a Synset

47 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Extended Synset Relevance Lexical Coverage  Take Length of the Synset Into Account [Gefängniszelle, Zelle] ("prison cell") [Zelle] ("living cell") Hyponyms  Take Hyponyms Into Account [Zelle,Gefängniszelle,Todeszelle] [Zelle,Körperzelle,Pflanzenzelle]

48 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Experiment – Medical Domain

49 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Related Recent Work Diana McCarthy, Rob Koeling, Julie Weeds, and John Carroll  Finding predominant senses in untagged text. In Proc. of ACL Chan, Yee Seng and Ng, Hwee Tou (2005)  Word Sense Disambiguation with Distribution Estimation. Proc. of IJCAI Mohammad, Saif and Hirst, Graeme.  Determining word sense dominance using a thesaurus. Proc. of EACL 2006.

50 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Day 2 - Introduction Words and Object Descriptions  Semantics on the Semantic Web Semantic Web, Ontologies and Natural Language Processing  The Lexical Semantic Web Knowledge Representation as Word Meaning  A Lexicon Model for Ontologies Enriching Ontologies with Linguistic Information

51 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Words and Object Descriptions Semantics on the Semantic Web The “Lexical Semantic Web” A Lexicon Model for Ontologies

52 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Web Web Consists of Non-Interpreted Data Text DBs ImagesTables

53 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Web Markup Interpretation through Markup - Categories

54 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia “Web 2.0” Markup Interpretation through Markup – User Tags

55 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia “Web 2.0” Markup Interpretation through Markup – User Tags

56 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Semantic Web Knowledge Markup Formal Interpretation - Knowledge Markup Ontologies

57 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Semantic Web Knowledge Markup Formal Interpretation - Knowledge Markup Ontologies

58 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Semantic Web Knowledge Markup Formal Interpretation - Knowledge Markup Ontologies

59 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Knowledge Markup Ontologies Turns the Web into a Knowledge Base

60 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Knowledge Markup Ontologies Semantic Web Services Enables Semantic Web Services …

61 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Intelligent Man-Machine Interface Knowledge Markup Ontologies Semantic Web Services … and Intelligent Man-Machine Interface

62 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Semantic Web Layer cake

63 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Resource Description Framework (RDF) node1 DFKI GmbH Kaiserslautern name location www

64 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia RDF : XML-based Representation DFKI GmbH Kaiserslautern

65 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia RDF Schema (RDFS) Representation of classes and properties PersonTeacher Student rdf:Literal name Course teaches enrolledIn is-a

66 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia RDFS : XML-based Representation

67 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Web Ontology Language (OWL) OWL adds further modelling vocabulary on top of RDFS, e.g.  Class equivalence  Property types (data vs. object property) Based on Description Logics, three versions  OWL Lite  OWL DL  OWL Full

68 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia OWL Extended knowledge representation PersonTeacher Student rdf:Literal name Course teaches enrolledIn is-a disjoint

69 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia OWL : XML-based Representation

70 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia XML – RDF – RDFS - OWL XML SchemaNamespaces Interpretation Context RDF Schema OWL Formalization: Class Definition, Properties Formalization: extended Class Definition, Properties, Property Types Data Types XML RDF SyntaxSemantics

71 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontologies – What they are Ontology refers to an engineering artifact  a specific vocabulary used to describe a certain reality  a set of explicit assumptions regarding the intended meaning of the vocabulary An Ontology is  an explicit specification of a conceptualization [Gruber 93]  a shared understanding of a domain of interest [Uschold/Gruninger 96]

72 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontologies – Why you need them Make domain assumptions explicit  Easier to exchange domain assumptions  Easier to understand and update legacy data Separate domain knowledge from operational knowledge  Re-use domain and operational knowledge separately A community reference for applications Shared understanding of what particular information means

73 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Applications of Ontologies NLP  Information Extraction, e.g. Buitelaar et al. 06, Mädche, Staab & Neumann 00, Nedellec, Rebholz  Information Retrieval (Semantic Search), e.g. WebKB (Martin et al. 00), OntoSeek (Guarino et al. 99), Ontobroker (Decker et al. 99)  Question Answering, e.g. Harabagiu, Schlobach & de Rijke, Aqualog (Lopez and Motta 04)  Machine Translation, e.g. Nirenburg et al. 04, Beale et al. 95, Hovy, Knight Other  Business Process Modeling, e.g. Uschold et al. 98  Digital Libraries, e.g. Amann & Fundulaki 99  Information Integration, e.g. Kashyap 99; Wiederhold 92  Knowledge Management (incl. Semantic Web), e.g. Fensel 01, Staab & Schnurr 00; Sure et al. 00, Abecker et al. 97  Software Agents, e.g. Gluschko et al. 99; Smith & Poulter 99  User Interfaces, e.g. Kesseler 96

74 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontologies and Their Relatives Catalogs Glossaries & Terminologies Thesauri Semantic Networks Formal isa Formal Instance General logical constraints Axioms: Disjoint/Inverse…

75 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Thesauri – Examples : EuroVoc EuroVoc  covers terminology in all of the official EU languages  for all fields (27) that concern the EU institutions, e.g. politics, trade, law, science, energy, agriculture MT 3606 natural and applied sciences UF gene pool genetic resource genetic stock genotype heredity BT1 biology BT2 life sciences NT1 DNA NT1 eugenics RT genetic engineering (6411)

76 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Thesauri – Examples : MeSH MeSH (Medical Subject Headings)  organized by terms (~ 250,000) that correspond to medical subjects  for each term syntactic, morphological or semantic variants are given MeSH Heading Databases, Genetic Entry Term Genetic Databases Entry Term Genetic Sequence Databases Entry Term OMIM Entry Term Online Mendelian Inheritance in Man Entry Term Genetic Data Banks Entry Term Genetic Data Bases Entry Term Genetic Databanks Entry Term Genetic Information Databases See Also Genetic Screening

77 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Semantic Networks - Examples : UMLS Unified Medical Language System  integrates linguistic, terminological and semantic information  Semantic Network consists of 134 semantic types and 54 relations between types Pharmacologic Substance affects Pathologic Function Pharmacologic Substance causes Pathologic Function Pharmacologic Substance complicatesPathologic Function Pharmacologic Substance diagnoses Pathologic Function Pharmacologic Substance prevents Pathologic Function Pharmacologic Substance treats Pathologic Function

78 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Semantic Networks - Examples : GO GO (Gene Ontology)  Aligns descriptions of gene products in different databases, including plant, animal and microbial genomes  Organizing principles are molecular function, biological process and cellular component Accession:GO: Ontology:biological process Synonyms:broad: genetic exchange Definition:In the absence of a sexual life cycle, the processes involved in the introduction of genetic information to create a genetically different individual. Term Lineageall : all (164142) GO: : biological process (115947) GO: : development (11892) GO: : genetic transfer (69)

79 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontologies – Example I

80 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontologies – Example II OntologyF-Logic similar city Neckar Zugspitze Geographical Entity (GE) Natural GE Inhabited GE country rivermountain instance_of Germany BerlinStuttgart is-a flow_through located_in capital_of flow_through located_in capital_of 367 length (km) 2962 height (m) Design: Philipp Cimiano

81 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontologies for NLP Information Retrieval  Query Expansion Machine Translation  Interlingua Information Extraction  Template Definition  Semantic Integration Question Answering  Question Analysis  Answer Selection

82 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Information Extraction Class-based Template Definition  Allows for Reasoning over Extracted Templates with Respect to the Ontology (see e.g. [Nedellec and Nazarenko 2005] for discussion) Semantic Integration  Extraction from Heterogeneous Sources (Text, Tables and other Semi-Structured Data, Image Captions) – SmartWeb [Buitelaar et al. 06]  Multi-Document Information Extraction – ArtEquAKT [Alani et al. 2003]

83 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Question Answering Question Analysis  Ontology/WordNet-based Semantic Question Interpretation (e.g. [Pasca and Harabagiu 01]) Answer Selection  Ontology/WordNet-based Reasoning for Answer Type-Checking Ontology of Events [Sinha and Narayanan 05] Geographical Ontology, WordNet [Schlobach & de Rijke 04] WordNet [Pasca and Harabagiu 01] Ontology-based Question Answering  Derive Answers from a Knowledge Base (e.g. Aqualog [Lopez & Motta 04])

84 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontology Life Cycle Create/Select Development and/or Selection Populate Knowledge Base Generation Validate Consistency Checks Evolve Extension, Modification Maintain Usability Tests Deploy Knowledge Retrieval

85 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia NLP in the Ontology Life Cycle Ontology Population Information Extraction Ontology Learning Text Mining KB Retrieval Question Answering

86 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontology Learning Terms (Multilingual) Synonyms Concept Formation Concept Hierarchy Relations Axiom Schemata GeneralAxioms Relation Hierarchy Design: Philipp Cimiano

87 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Words and Object Descriptions Semantics on the Semantic Web The “Lexical Semantic Web” A Lexicon Model for Ontologies

88 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Dictionary: Words and Senses Represent interpretations of words through senses, very much like classes that are assigned to a word, e.g. article 1.An individual thing or element of a class… 2. A particular section or item of a series in a written document… 3. A non-fictional literary composition that forms an independent part of a publication… 4. The part of speech used to indicate nouns and to specify their application 5. A particular part or subject; a specific matter or point (as provided by

89 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontology: Classes and Labels - I Ontologies assign labels (i.e. words) to a given class In the COMMA ontology on document management the class article corresponds to sense 2 (‘section of a written document’):

90 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontology Classes and Labels - II In the GOLD ontology on linguistics, the class label article corresponds to sense 4 (‘part of speech ’):

91 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia The Meaning of Director - I The Semantic Web can be viewed as a large, distributed dictionary (or rather a semantic lexicon) in which we can look up the meaning of words, e.g. director … as a ‘role’ (AgentCities ontology)

92 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia The Meaning of Director - II … as ‘head of a program’ (University Benchmark ontology)

93 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Exploring the Lexical Semantic Web Collect ontologies  OntoSelect Analyse the use of class/property labels Treat class/property labels as lexical entries  Normalize  Organize by language

94 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontology Collection OntoSelect  Web Monitor on DAML, RDFS, OWL Files  Download, Analyze and Store Included Information and Metadata Class and Property Labels Multilingual Information Included Ontologies  Ontology Ranking and Selection Functionalities

95 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia OntoSelect

96 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Multilinguality on the Semantic Web

97 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Multilingual Labels

98 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia “Lexical Semantic Ambiguity”

99 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Words and Object Descriptions Semantics on the Semantic Web The “Lexical Semantic Web” A Lexicon Model for Ontologies

100 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontologies – Example III

101 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontologies – Example III (continued) Campus University “Fakultät” located_at is_part_of Student studies_at Staff works_at

102 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontologies – Example III (continued) Campus University “Fakultät” located_at is_part_of Fakultät has_German_term School has_US_English_term Faculteit has_Dutch_term Student studies_at Staff works_at

103 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Ontologies – Example III (continued) University “Fakultät” is_part_of Term has_term Fakultät instance_of DE language faculteit instance_of NL language school EN-US language

104 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Semiotic Triangle Ogden & Richards, 1923 based on Structural Linguistics studies (de Saussure, 1916) adopted in Knowledge Representation (e.g. Sowa, 1984)

105 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia LingInfo Model – Simplified Design: Michael Sintek

106 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia LingInfo Model

107 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia LingInfo Instances - Example Fußballspielers „of the football player“

108 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia LingInfo Predicate-Arg Structure Design: Anette Frank

109 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

110 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

111 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

112 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

113 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia

114 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Conclusions

115 © Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Conclusions WordNet: Appropriate Use may include  Introduction of underspecified senses (sense grouping)  Tuning to a domain The “Lexical Semantic Web”  The Semantic Web (and Web 2.0) is a potentially rich resource for (formal) lexical semantics  Mining such resources for lexical semantics (i.e. compilation of a distributed semantic lexicon) only just started  Ontologies to be extended with linguistic/lexical information


Download ppt "© Paul Buitelaar: Lexical Semantics and Ontologies Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia Lexical Semantics and Ontologies Tutorial at."

Similar presentations


Ads by Google