Presentation is loading. Please wait.

Presentation is loading. Please wait.

Encyclopaedic Annotation of Text.  Entity level difficulty  All the entities in a document may not be in reader’s knowledge space  Lexical difficulty.

Similar presentations


Presentation on theme: "Encyclopaedic Annotation of Text.  Entity level difficulty  All the entities in a document may not be in reader’s knowledge space  Lexical difficulty."— Presentation transcript:

1 Encyclopaedic Annotation of Text

2  Entity level difficulty  All the entities in a document may not be in reader’s knowledge space  Lexical difficulty  Arises due to presence of difficult word with respect to reader’s level  Collaborate  work together, Biennially  every two years  Syntactic difficulty  Arises due to use of complex syntactic constructs wrt reader level

3  Annotate text with encyclopaedic references  Find important concepts and entities  Link identified entities and concepts to respective knowledge sources  Entity Linking (EL) problem  is the task of linking name mentions in text with their referent entities in a knowledge base  Entity Disambiguation (ED) problem  An entity may have many referents

4  Word Sense Disambiguation  Predicting the sense of a word in a sentence, when the word may have multiple senses  Mapping of word to sense  EL/ED  Models encyclopaedia where important words, phrases or entities in a page are linked to respective informative pages

5  Given a text generate wikipedia-like annotation automatically Resource: Wikify! Linking Documents to Encyclopedic Knowledge, Rada Mihalcea and Andras Csomai

6  Collaborative encyclopaedia  Wikipedia article  defines and describes an entity or an event  consists of a hypertext document with hyperlinks to other pages within or outside Wikipedia  uniquely referenced by an identifier ▪ counter for drinks  bar (counter)  Hyperlink ▪ Unique identifier + anchor text ▪ “Henry Barnard, [[United States|American]] [[educationalist]], was born in [[Hartford, Connecticut]]”  Disambiguation page ▪ consist of links to articles defining the different meanings of the entity

7

8  Keyword extraction follows Wikipedia manual  Links to articles that provide deeper understanding of topics like technical terms, names, places etc.  Avoid linking terms unrelated to main topic and having no article to explain  Avoid too many links

9  Supervised or unsupervised  Candidate keywords should be limited to those that have a valid corresponding Wikipedia article  keyword vocabulary that contains only the Wikipedia article titles ▪ Augment the list with different morphological forms ▪ dissecting or dissections can be linked to the same article dissection.

10  Unsupervised keyword extraction from document  Candidate extraction ▪ From input document extract all possible n-grams that are also present in controlled vocabulary  Keyword ranking ▪ Assign score reflecting likelihood of a candidate to be a valuable keyphrase

11

12

13

14  Links can be treated as sense annotations  Wiki data has larger coverage of sense annotations of entities (nouns)  Presence of huge number of named entities  Multi-word expressions (e.g., mother church)

15  Knowledge driven methods  Lesk algorithm ▪ most likely meaning for a word in a given context based on a measure of contextual overlap between the dictionary definitions of the ambiguous word and the context ▪ Modelling Wikification as WSD ▪ Dictionary definition  wikipedia page ▪ Context  paragraph in which the word occurs  Data driven methods

16 Document mentions Local approaches disambiguate each mention in a document separately utilize clues such as the textual similarity between the document and each candidate disambiguation’s Wikipedia page Candidate labels mention-to-label compatibility

17 Michael Jeffrey Jordan (born February 17, 1963), also known by his initials, MJ, is an American former professional basketball player. Jordan joined the NBA's Chicago Bulls in 1984. Michael Jordan fuelled the success of Nike's Air Jordan sneakers. He also starred in the 1996 feature film Space Jam as himself.

18 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997.. Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II. Resource: Local and Global Algorithms for Disambiguation to Wikipedia, Ratinov el al.

19 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997.. Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II.

20 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997.. Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II.

21 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997.. Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II. Used_In Is_a Succeeded Released

22 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997.. Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II.

23 Document mentions mention-to-label compatibility Inter-label topical coherence Collective Entity Linking Candidate labels

24

25 Text Document(s)—News, Blogs,… Wikipedia Articles

26

27

28 Text Document(s)—News, Blogs,… Wikipedia Articles many-to-one matching in a bipartite graph

29  Γ is a solution to the problem  A set of pairs (m,t)  m: a mention in the document  t: the matched Wikipedia Title Text Document(s)—News, Blogs,… Wikipedia Articles

30  Γ is a solution to the problem  A set of pairs (m,t)  m: a mention in the document  t: the matched Wikipedia Title Text Document(s)—News, Blogs,… Wikipedia Articles Local score of matching the mention to the title

31 A “global” term – evaluating how good the structure of the solution is Text Document(s)—News, Blogs,… Wikipedia Articles

32 Text Document(s)—News, Blogs,… Wikipedia Articles

33 Text Document(s)—News, Blogs,… Wikipedia Articles

34

35 Augment Mention List Construct Disambiguation Candidates Ranker Linker

36

37

38

39

40  Text(t)  TF-IDF summary of Wikipedia title t  Context(t)  TF-IDF summary of the context within which t is hyperlinked in Wikipedia  Text(d)  TF-IDF summary of d containing m  Context(m)  TF-IDF summary of context window of m  Local features  cosine-sim(Text(t),Text(m))  cosine-sim(Text(t),Context(m))  cosine-sim(Context(t),Text(m))  cosine-sim(Context(t),Context(m))

41

42  Wikipedia relatedness measures  Normalized Google Distance  Pointwise Mutual Information

43


Download ppt "Encyclopaedic Annotation of Text.  Entity level difficulty  All the entities in a document may not be in reader’s knowledge space  Lexical difficulty."

Similar presentations


Ads by Google