Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 18 Ontologies and Wordnet Topics Ontologies Wordnet Overview of MeaningReadings: Text 13.5 NLTK book Chapter 2 March 25, 2013 CSCE 771 Natural.

Similar presentations


Presentation on theme: "Lecture 18 Ontologies and Wordnet Topics Ontologies Wordnet Overview of MeaningReadings: Text 13.5 NLTK book Chapter 2 March 25, 2013 CSCE 771 Natural."— Presentation transcript:

1 Lecture 18 Ontologies and Wordnet Topics Ontologies Wordnet Overview of MeaningReadings: Text 13.5 NLTK book Chapter 2 March 25, 2013 CSCE 771 Natural Language Processing

2 – 2 – CSCE 771 Spring 2013 Overview Last Time (Programming) Chunking Chunking with NLTK HW 5 Project IdeasToday app.ChunkParser under NLTKReadings: Chapter 7 http://www.nltk.org/howto http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html Next Time:

3 – 3 – CSCE 771 Spring 2013 Ontologies – the old meaning http://www.merriam-webster.com/dictionary/ontology 1.: a branch of metaphysics concerned with the nature and relations of being 2.: a particular theory about the nature of being or the kinds of things that have existence

4 – 4 – CSCE 771 Spring 2013 Ontologies – the new (CS) meaning http://en.wikipedia.org/wiki/Ontology_(information_science) “In computer science and information science, an ontology formally represents knowledge as a set of concepts within a domain, and the relationships between pairs of concepts.” computer scienceinformation sciencedomaincomputer scienceinformation sciencedomain "Toward Principles for the Design of Ontologies Used for Knowledge Sharing" by Tom Gruber 1993 Tom GruberTom Gruber “An ontology is a formal, explicit specification of a shared conceptualization.”“An ontology is a formal, explicit specification of a shared conceptualization.” http://en.wikipedia.org/wiki/Ontology_(information_science)

5 – 5 – CSCE 771 Spring 2013 Gruber elaborating "An ontology is a description (like a formal specification of a program) of the concepts and relationships that can formally exist for an agent or a community of agents. This definition is consistent with the usage of ontology as set of concept definitions, but more general. And it is a different sense of the word than its use in philosophy." [8] Gruber 2001 “ [8] Gruber [8] Gruber

6 – 6 – CSCE 771 Spring 2013 Focus Levels of Ontologies GenericCoreDomainTaskApplication

7 – 7 – CSCE 771 Spring 2013 Examples of in-use Ontologies Medical UMLSUMLSUMLS SNOMED-RT,SNOMED-RT,SNOMED-RT GALEN,GALEN, MEDLINEMEDLINEMEDLINE Linguistics Wordnet Miller Princeton 1990sWordnet Miller Princeton 1990s Gold http://linguistics-ontology.org/Gold http://linguistics-ontology.org/http://linguistics-ontology.org/

8 – 8 – CSCE 771 Spring 2013 Early OWL versions OWL provides three increasingly expressive sublanguages 1.OWL Lite supports those users primarily needing a classification hierarchy and simple constraints 2.OWL DL supports those users who want the maximum expressiveness while retaining computational completeness (all conclusions are guaranteed to be computable) and decidability (all computations will finish in finite time). 3.OWL Full is meant for users who want maximum expressiveness and the syntactic freedom of RDF with no computational guarantees http://www.w3.org/TR/2004/REC-owl-features-20040210/#s1.3

9 – 9 – CSCE 771 Spring 2013 Owl 2.0 The OWL 2 Web Ontology Language, informally OWL 2, is an ontology language for the Semantic Web with formally defined meaning. OWL 2 ontologies provide classes, properties, individuals, and data values and are stored as Semantic Web documents. OWL 2 ontologies can be used along with information written in RDF, and OWL 2 ontologies themselves are primarily exchanged as RDF documents. http://www.w3.org/TR/owl2-overview/

10 – 10 – CSCE 771 Spring 2013 Owl 2 relationships to other languages http://www.w3.org/TR/owl2-overview/#Semantics

11 – 11 – CSCE 771 Spring 2013 ontology tools - Editors Editors – protégé http://protege.stanford.edu/ http://protege.stanford.edu/

12 – 12 – CSCE 771 Spring 2013 Semantic Web Web – static web pages + Web 2.0 - http://en.wikipedia.org/wiki/Web_2.0 ~1999 http://en.wikipedia.org/wiki/Web_2.0 Semantic Web "The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation." It is a source to retrieve information from the web (using the web spiders from RDF files) and access the data through Semantic Web Agents or Semantic Web Services. Source: "The Semantic Web" by Tim Berners-Lee, James Hendler, and Ora Lassila, Scientific American, 2001 Tim Berners-LeeJames HendlerOra LassilaScientific American, 2001Tim Berners-LeeJames HendlerOra LassilaScientific American, 2001

13 – 13 – CSCE 771 Spring 2013 Basic NLTK Corpus Functionality ExampleDescription fileids()the files of the corpus fileids([categories])the files of the corpus corresponding to these categories categories()the categories of the corpus categories([fileids])the categories of the corpus corresponding to these files raw()the raw content of the corpus raw(fileids=[f1,f2,f3])the raw content of the specified files raw(categories=[c1,c2])the raw content of the specified categories words()the words of the whole corpus words(fileids=[f1,f2,f3])the words of the specified fileids words(categories=[c1,c2])the words of the specified categories sents()the sentences of the whole corpus sents(fileids=[f1,f2,f3])the sentences of the specified fileids sents(categories=[c1,c2])the sentences of the specified categories abspath(fileid)the location of the given file on disk encoding(fileid)the encoding of the file (if known) open(fileid)open a stream for reading the given corpus file root()the path to the root of locally installed corpus readme()the contents of the README file of the corpus Reference: NLTK Book Chapter 2

14 – 14 – CSCE 771 Spring 2013 More from Chapter 2 of NLTK Book 2.2 Conditional Frequency Distributions Conditions and Events Counting Words by Genre Plotting and Tabulating Distributions Generating Random Text with Bigrams 2.3 More Python: Reusing Code Functions Modules 2.4 Lexical Resources Wordlist Corpora A Pronouncing Dictionary Comparative Wordlists Shoebox and Toolbox Lexicons2.5 WordNet Reference: NLTK Book Chapter 2

15 – 15 – CSCE 771 Spring 2013 Wordnet George Miller Princeton University NLTK includes the English WordNet, with 155,287 words and 117,659 synonym sets Links: http://en.wikipedia.org/wiki/WordNethttp://en.wikipedia.org/wiki/WordNethttp://en.wikipedia.org/wiki/WordNet http://wordnet.princeton.edu/http://wordnet.princeton.edu/http://wordnet.princeton.edu/ http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.htmlhttp://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.htmlhttp://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html Reference: NLTK Book Chapter 2

16 – 16 – CSCE 771 Spring 2013 WordNet WordNet distinguishes between nouns, verbs, adjectives and adverbs—it does not include prepositions, determiners etc. nounsverbs adjectivesadverbs prepositionsdeterminersnounsverbs adjectivesadverbs prepositionsdeterminers Every synset contains a group of synonymous words or collocations collocations Different senses of a word are in different synsets.

17 – 17 – CSCE 771 Spring 2013 Nouns in Wordnet hypernymshypernyms: Y is a hypernym of X if every X is a (kind of) Y (canine is a hypernym of dog, because every dog is a member of the larger category of canines) hypernyms hyponymshyponyms: Y is a hyponym of X if every Y is a (kind of) X (dog is a hyponym of canine) hyponyms coordinate terms: Y is a coordinate term of X if X and Y share a hypernym (wolf is a coordinate term of dog, and dog is a coordinate term of wolf) holonymholonym: Y is a holonym of X if X is a part of Y (building is a holonym of window) holonym meronymmeronym: Y is a meronym of X if Y is a part of X (window is a meronym of building) meronym

18 – 18 – CSCE 771 Spring 2013 Verbs in Wordnet hypernym: the verb Y is a hypernym of the verb X if the activity X is a (kind of) Y (to perceive is an hypernym of to listen) troponymtroponym: the verb Y is a troponym of the verb X if the activity Y is doing X in some manner (to lisp is a troponym of to talk) troponym entailmententailment: the verb Y is entailed by X if by doing X you must be doing Y (to sleep is entailed by to snore) entailment coordinate terms: those verbs sharing a common hypernym (to lisp and to yell)

19 – 19 – CSCE 771 Spring 2013 Adjectives/Adverbs in Wordnet Adjectives related nounsrelated nouns similar tosimilar to participle of verbparticiple of verbAdverbs root adjectivesroot adjectives

20 – 20 – CSCE 771 Spring 2013 Knowledge Structure Example defined by hypernym or IS A relationships Example: dog, domestic dog, Canis familiaris => canine, canid => carnivore => placental, placental mammal, eutherian mammal => mammal => vertebrate, craniate => chordate => animal, animate being, beast, brute, creature, fauna =>...

21 – 21 – CSCE 771 Spring 2013 Hypernym/Hyponym Inverse relations Hyponym == ISA Hypernym == “contains the subset” Examples car is a hyponym of vehicle  vehicle is a hypernym of carcar is a hyponym of vehicle  vehicle is a hypernym of car Dog is hyponym of animal  animal is a hypernym of dogDog is hyponym of animal  animal is a hypernym of dog Sometimes superordinate used instead of hypernymSometimes superordinate used instead of hypernym

22 – 22 – CSCE 771 Spring 2013 WordNet as an ontology Hyponym == ISA Meronymy – part of relation wheel part of car  wheel is meronymy of car Holnym inverse of meronymy

23 – 23 – CSCE 771 Spring 2013 Senses and Synonyms >>> from nltk.corpus import wordnet as wn >>> wn.synsets('motorcar') [Synset('car.n.01')] one meaning the first(01) noun sense(n) of car >>> wn.synset('car.n.01').lemma_names ['car', 'auto', 'automobile', 'machine', 'motorcar'] synonymous words (or "lemmas") synonymous words (or "lemmas") Reference: NLTK Book Chapter 2

24 – 24 – CSCE 771 Spring 2013 Definitions and examples >>> wn.synset('car.n.01').definition 'a motor vehicle with four wheels; usually propelled by an internal combustion engine' >>> wn.synset('car.n.01').examples ['he needs a car to get to work'] Reference: NLTK Book Chapter 2

25 – 25 – CSCE 771 Spring 2013 >>> wn.synsets('car') [Synset('car.n.01'), Synset('car.n.02'), Synset('car.n.03'), Synset('car.n.04'), Synset('cable_car.n.01')] >>> for synset in wn.synsets('car'):... print synset.lemma_names... ['car', 'auto', 'automobile', 'machine', 'motorcar'] ['car', 'railcar', 'railway_car', 'railroad_car'] ['car', 'gondola'] ['car', 'elevator_car'] ['cable_car', 'car'] Reference: NLTK Book Chapter 2

26 – 26 – CSCE 771 Spring 2013 The WordNet Hierarchy Hypernyms (up) Hyponyms (down) Meronyms- components holonyms - things they are contained in Reference: NLTK Book Chapter 2

27 – 27 – CSCE 771 Spring 2013 Synonyms and Lemmas >>> motorcar = wn.synset('car.n.01') >>> types_of_motorcar = motorcar.hyponyms() >>> types_of_motorcar[26] Synset('ambulance.n.01') >>> sorted([lemma.name for synset in types_of_motorcar for lemma in synset.lemmas]) ['Model_T', 'S.U.V.', 'SUV', 'Stanley_Steamer', 'ambulance', 'beach_waggon', … ] ['Model_T', 'S.U.V.', 'SUV', 'Stanley_Steamer', 'ambulance', 'beach_waggon', … ] Reference: NLTK Book Chapter 2

28 – 28 – CSCE 771 Spring 2013 Meronyms and Holonyms >>> wn.synset('tree.n.01').part_meronyms() [Synset('burl.n.02'), Synset('crown.n.07'), Synset('stump.n.01'), Synset('trunk.n.01'), Synset('limb.n.02')] >>> wn.synset('tree.n.01').substance_meronyms() [Synset('heartwood.n.01'), Synset('sapwood.n.01')] >>> wn.synset('tree.n.01').member_holonyms() [Synset('forest.n.01')] Reference: NLTK Book Chapter 2

29 – 29 – CSCE 771 Spring 2013 >>> for synset in wn.synsets('mint', wn.NOUN):... print synset.name + ':', synset.definition... batch.n.02: (often followed by `of') a large number or amount or extent mint.n.02: any north temperate plant of the genus Mentha with aromatic leaves and small mauve flowers mint.n.03: any member of the mint family of plants mint.n.04: the leaves of a mint plant used fresh or candied mint.n.05: a candy that is flavored with a mint oil mint.n.06: a plant where money is coined by authority of the government Reference: NLTK Book Chapter 2

30 – 30 – CSCE 771 Spring 2013 Entailments walking entails stepping >>> wn.synset('walk.v.01').entailments() [Synset('step.v.01')] >>> wn.synset('eat.v.01').entailments() [Synset('swallow.v.01'), Synset('chew.v.01')] >>> wn.synset('tease.v.03').entailments() [Synset('arouse.v.07'), Synset('disappoint.v.01')] Reference: NLTK Book Chapter 2

31 – 31 – CSCE 771 Spring 2013 Antonyms Reference: NLTK Book Chapter 2

32 – 32 – CSCE 771 Spring 2013 Semantic Similarity >>> right = wn.synset('right_whale.n.01') >>> orca = wn.synset('orca.n.01') >>> minke = wn.synset('minke_whale.n.01') >>> tortoise = wn.synset('tortoise.n.01') >>> novel = wn.synset('novel.n.01') >>> right.lowest_common_hypernyms(minke) [Synset('baleen_whale.n.01')] >>> right.lowest_common_hypernyms(orca) [Synset('whale.n.02')] >>> right.lowest_common_hypernyms(tortoise) [Synset('vertebrate.n.01')] >>> right.lowest_common_hypernyms(novel) [Synset('entity.n.01')] Reference: NLTK Book Chapter 2

33 – 33 – CSCE 771 Spring 2013 Generality/Specificity and Depth >>> wn.synset('baleen_whale.n.01').min_depth() 14 >>> wn.synset('whale.n.02').min_depth() 13 >>> wn.synset('vertebrate.n.01').min_depth() 8 >>> wn.synset('entity.n.01').min_depth() 0

34 – 34 – CSCE 771 Spring 2013 Similarity Scores from Right Whale >>> right.path_similarity(minke) 0.25 >>> right.path_similarity(orca) 0.16666666666666666 >>> right.path_similarity(tortoise) 0.076923076923076927 >>> right.path_similarity(novel) 0.043478260869565216

35 – 35 – CSCE 771 Spring 2013 Googlecode - HowTo http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.ht ml http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.ht ml WordNet Interface >>> from nltk.corpus import wordnet as wn Reference: http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html


Download ppt "Lecture 18 Ontologies and Wordnet Topics Ontologies Wordnet Overview of MeaningReadings: Text 13.5 NLTK book Chapter 2 March 25, 2013 CSCE 771 Natural."

Similar presentations


Ads by Google