Chris Biemann, Stefan Bordag, Uwe Quasthoff

Slides:



Advertisements
Similar presentations
Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences Chris Biemann, Stefan Bordag, Uwe Quasthoff University of Leipzig, NLP Department.
Advertisements

RAPTOR Syntax and Semantics By Lt Col Schorsch
Thermal Properties at Earth’s Surface
Semantics Semantics is the branch of linguistics that deals with the study of meaning, changes in meaning, and the principles that govern the relationship.
1 Semantic Indexing with Typed Terms using Rapid Annotation 16th of August 2005 TKE-05 Workshop on Semantic Indexing, Copenhagen Chris Biemann University.
Thermal radiation Any object that is hot gives off light known as Thermal Radiation.  The hotter an object is, the more light it emits.  As the temperature.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Query Biased Snippet Generation in XML Search Yi Chen Yu Huang, Ziyang Liu, Yi Chen Arizona State University.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
Backtracking.
Technology and digital images. Objectives Describe how the characteristics and behaviors of white light allow us to see colored objects. Describe the.
Do Now: What is this?. Other types of Color Wheels.
LANGUAGE NETWORKS THE SMALL WORLD OF HUMAN LANGUAGE Akilan Velmurugan Computer Networks – CS 790G.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
1 Query Operations Relevance Feedback & Query Expansion.
Oh, yeah, project time!! Grade 8 Design a pillow using stenciling 1. Stencil…one color 2. Stencil…two color. 3. Design final stencils for pillow.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
Color Theory Why study color theory? If you are involved in the creation or design of visual documents, an understanding of color will help when incorporating.
Union-find Algorithm Presented by Michael Cassarino.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 3. Word Association.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
Supervised Classification in Imagine D. Meyer E. Wood
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
NATURAL LANGUAGE PROCESSING
James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.
Text Summarization using Lexical Chains. Summarization using Lexical Chains Summarization? What is Summarization? Advantages… Challenges…
Chris Biemann University of Leipzig, NLP-Dept. Leipzig, Germany
Color theory.
The Color Wheel The color wheel is a means of organizing the colors in the spectrum. The color wheel consists of 12 sections, each containing one hue.
Measuring Monolinguality
Management & Planning Tools
An Automatic Construction of Arabic Similarity Thesaurus
CS 430: Information Discovery
Seeing Color 9/19/2018 A prism can be used to break white light down into all the pure colors of light that make-up white light.
CSC 594 Topics in AI – Natural Language Processing
COMBINED UNSUPERVISED AND SEMI-SUPERVISED LEARNING FOR DATA CLASSIFICATION Fabricio Aparecido Breve, Daniel Carlos Guimarães Pedronette State University.
Compact Query Term Selection Using Topically Related Text
Applying Key Phrase Extraction to aid Invalidity Search
Statistical NLP: Lecture 9
Questions for lesson 3 Perception 11/27/2018 lesson 3.
A Markov Random Field Model for Term Dependencies
Presented by: Prof. Ali Jaoua
Information Organization: Clustering
WordNet WordNet, WSD.
Coverage Approximation Algorithms
Image Information Extraction
N-Gram Model Formulas Word sequences Chain rule of probability
Introduction Task: extracting relational facts from text
German Year 4 Units 13 The Colours Die Farben
Color Wheel.
Text Mining & Natural Language Processing
What Color is it?.
TEMPERATURE AND HEAT TRANSFER REVIEW
Chapter V, Printing Digital Images: Lesson III Using Software to Adjust the Image
Color Theory Study Guide
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
How electronic systems work using the Systems approach.
Ying Dai Faculty of software and information science,
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Colour Theory Year 7.
AVI2O Colour theory.
What’s your favorite color?
Chris Biemann University of Leipzig, NLP-Dept. Leipzig, Germany
Semantic Indexing with Typed Terms using Rapid Annotation
Statistical NLP : Lecture 9 Word Sense Disambiguation
Statistical NLP: Lecture 10
DIGITAL IMAGE PROCESSING Elective 3 (5th Sem.)
Presentation transcript:

Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences Chris Biemann, Stefan Bordag, Uwe Quasthoff University of Leipzig, NLP Department LREC 2004, Learning & Acquisition (II), 27th of May 2004

Sets of Words Our goal is the automatic extension of homogenous word sets, i.e. WordNet synsets or small subtrees of some hierarchy We collect methods and apply them, eventually in combination Mind experiment: the computer as „associator“: Input: some example concepts - Detection of the relation - Output of additional instances This can be done semi-supervised Necessary: - very large text corpus - features - methods Chris Biemann

Statistical Co-occurrences occurrence of two or more words within a well-defined unit of information (sentence, nearest neighbors) Significant Co-occurrences reflect relations between words Significance Measure (log-likelihood): - k is the number of sentences containing a and b together - ab is (number of sentences with a)*(number of sentences with b) - n is total number of sentences in corpus Chris Biemann

Iterating Co-occurrences (sentence-based) co-ocurrences of first order: words that co-occur significantly often together in sentences co-occurrences of second order: words that co-occur significantly often in collocation sets of first order co-occurrences of n-th order: words that co-occur significantly often in collocation sets of (n-1)th order When calculating a higher order, the significance values of the preceding order are not relevant. A co-occurrence set consists of the N highest ranked co-occurrences of a word. Chris Biemann

Constructed Example I Ord 1 dog terrier cat mouse barking bite yelp - 3 1 - 2 Chris Biemann

Constructed Example II Ord 2 dog terrier cat mouse barking bite yelp x - Ord 3 dog terrier cat mouse barking bite yelp - 1 Chris Biemann

Properties of Iterated Co-occurrences after some iterations the sets remain more or less stable the sets are somewhat semantically homogeneous sometimes, they have to do nothing with the reference word calculations performed until 10th order. Example for TOP 20 NB-collocations of 10th order for „erklärte“ [explained]: sagte, schwärmte, lobt, schimpfte, meinte, jubelte, lobte, resümierte, schwärmt, Reinhard Heß, ärgerte, kommentierte, urteilte, analysierte, bilanzierte, freute, freute sich, Bundestrainer, freut ,gefreut [said, enthused, praises, grumbled, meant, was jubilant, praised, summarized, dreamt, Reinhard Hess, annoyed, commentated, judged, analyzed, balanced, made happy, was pleased, coach of the national team, is pleased, been pleased] Chris Biemann

Mapping co-occurrences to graphs For all words having co-occurrences, form nodes in a graph. Connect them all by edges, initialize edge weight with 0 For every co-occurrence of two words in a sentence, increase edge weight by significance Chris Biemann

First Iteration Step The two black nodes A and B get connected in the step if there are many nodes C which are connected to both A and B The more Cs, the higher the weight of the new edge existing connection new connection Chris Biemann

Second Iteration Step The two black nodes A and B get connected in the step if there are many (dark grey) nodes Ds which are connected to both A and B. The connections between the nodes Ds and the nodes A and B were constructed because of (light gray) nodes Es and Fs, respectively Es Ds Fs former connection existing connection new connection B A Chris Biemann

Collapsing bridging nodes Upper bound for path length in iteration n is 2n. However, some of the bridging nodes collapse, giving rise to self-keeping clusters of arbitrary path length, which are invariant under iteration. Upper 5 nodes: invariant cluster A, B are being absorbed by this cluster Chris Biemann

Examples of Iterated Co-occurrences Order Reference word TOP-10 collocations N2 wine wines, champagne, beer, water, tea, coffee, Wine, alcoholic, beers, cider S10 wines, grape, sauvignon, chardonnay, noir, pinot, cabernet, spicy, bottle, grapes S1 ringing phone, bells, phones, hook, bell, endorsement, distinctive, ears, alarm, telephone S2 rung, Centrex, rang, phone, sounded, bell, ring, FaxxMaster, sound, tolled S4 sounded, rung, rang, tolled, tolling, sound, tone, toll, ring, doorbell pressing Ctrl, Shift, press, keypad, keys, key, keyboard, you, cursor, menu, PgDn, keyboards, numeric, Alt, Caps, CapsLock, NUMLOCK, NumLock, Scroll Chris Biemann

Intersection of Co-occurrence Sets: resolving ambiguity Herz-Bube Becker bedient - folgenden - gereizt - Karo-Buben - Karo-Dame - Karo-König - Karte - Karten - Kreuz-Ass - Kreuz-Dame - Kreuz-Hand - Kreuz-König - legt - Mittelhand - Null ouvert - Pik - Pik-Ass - Pik-Dame - schmiert - Skat - spielt - Spielverlauf - sticht - übernimmt - zieht - Agassi - Australian Open - Bindewald - Boris - Break - Chang - Dickhaut - - gewann - Ivanisevic - Kafelnikow - Kiefer - Komljenovic - Leimen - Matchball - Michael Stich - Monte Carlo - Prinosil - Sieg - Spiel - spielen - Steeb - Teamchef Stich Achtelfinale - Aufschlag - Boris Becker - Daviscup - Doppel - DTB – Edberg - Finale - Graf - Haas - Halbfinale - Match - Pilic - Runde - Sampras - Satz - Tennis - Turnier - Viertelfinale - Weltrangliste - Wimbledon Alleinspieler - Herz - Herz-Dame - Herz-König - Hinterhand - Karo - Karo-As - Karo-Bube - Kreuz-As - Kreuz-Bube - Pik-As - Pik-Bube - Pik-König - Vorhand - Becker - Courier - Einzel - Elmshorn - French Open - Herz-As - ins - Kafelnikow - Karbacher - Krajicek - Kreuz-As - Kreuz-Bube - Michael Stich - Mittelhand - Pik-As - Pik-Bube - Pik-König Stich Chris Biemann

Example: NB-collocations of 2nd order warm, kühl, kalt Disjunction and filtering for adjectives of collocation sets for warm, kühl, kalt [warm, cool, cold] results in: abgekühlt, aufgeheizt, eingefroren, erhitzt, erwärmt, gebrannt, gelagert, heiß, heruntergekühlt, verbrannt, wärmer [cooled down, heated, frozen, heated up, warms up, burned, stored, hot, down-cooled, burned, more warmly] emotional reading „abweisend“ [repelling] for kühl, kalt is eliminated Chris Biemann

Detection of X-onyms synonyms, antonyms, (co)-hyponyms... Idea: Intersection of co-occurrence sets of two X-onyms as reference words should contain X-onyms lexical ambiguity of one reference word does not deteriorate the result set Method: - Detect word class for reference words - calculate co-occurrences for reference words - filter co-occurrences w.r.t the word class of the reference words (by means of POS tags) - perform disjunction of the co-occurrence sets - output result ranking can be realized over significance values of the co-occurrences Chris Biemann

Mini-Evaluation Experiments for different data sources, NB-collocations of 2nd and 3rd order fraction of X-onyms in TOP 5 higher than in TOP 10  ranking method makes sense disjunction of 2nd-order and 3rd-order collocations almost always empty  different orders exhibit different relations satisfactory quantity, more through larger corpora quality: for unsupervised extension not precise enough Chris Biemann

Word Sets for Thesaurus Expansion Application: thesaurus expansion start set: [warm, kalt] [warm, cold] result set: [heiß, wärmer, kälter, erwärmt, gut, heißer, hoch, höher, niedriger, schlecht, frei] [hot, warmer, colder, warmed, good, hotter, high, higher, lower, bad, free] start set: [gelb, rot] [yellow, red] result set: [blau, grün, schwarz, grau, bunt, leuchtend, rötlich, braun, dunkel, rotbraun, weiß] [blue, green, black, grey, colorful, bright, reddish, brown, dark, red-brown, white] start set: [Mörder, Killer] [murderer, killer] result set: [Täter, Straftäter, Verbrecher, Kriegsverbrecher, Räuber, Terroristen, Mann, Mitglieder, Männer, Attentäter] [offender, delinquent, criminal, war criminal, robber, terrorists, man, members, men, assassin Chris Biemann

More Examples in English Intersection of N2-Order collocation sets Chris Biemann

Questions? THANK YOU ! Chris Biemann