Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chris Biemann, Stefan Bordag, Uwe Quasthoff

Similar presentations


Presentation on theme: "Chris Biemann, Stefan Bordag, Uwe Quasthoff"— Presentation transcript:

1 Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences
Chris Biemann, Stefan Bordag, Uwe Quasthoff University of Leipzig, NLP Department LREC 2004, Learning & Acquisition (II), 27th of May 2004

2 Sets of Words Our goal is the automatic extension of homogenous word sets, i.e. WordNet synsets or small subtrees of some hierarchy We collect methods and apply them, eventually in combination Mind experiment: the computer as „associator“: Input: some example concepts - Detection of the relation - Output of additional instances This can be done semi-supervised Necessary: - very large text corpus - features - methods Chris Biemann

3 Statistical Co-occurrences
occurrence of two or more words within a well-defined unit of information (sentence, nearest neighbors) Significant Co-occurrences reflect relations between words Significance Measure (log-likelihood): - k is the number of sentences containing a and b together - ab is (number of sentences with a)*(number of sentences with b) - n is total number of sentences in corpus Chris Biemann

4 Iterating Co-occurrences
(sentence-based) co-ocurrences of first order: words that co-occur significantly often together in sentences co-occurrences of second order: words that co-occur significantly often in collocation sets of first order co-occurrences of n-th order: words that co-occur significantly often in collocation sets of (n-1)th order When calculating a higher order, the significance values of the preceding order are not relevant. A co-occurrence set consists of the N highest ranked co-occurrences of a word. Chris Biemann

5 Constructed Example I Ord 1 dog terrier cat mouse barking bite yelp -
3 1 - 2 Chris Biemann

6 Constructed Example II
Ord 2 dog terrier cat mouse barking bite yelp x - Ord 3 dog terrier cat mouse barking bite yelp - 1 Chris Biemann

7 Properties of Iterated Co-occurrences
after some iterations the sets remain more or less stable the sets are somewhat semantically homogeneous sometimes, they have to do nothing with the reference word calculations performed until 10th order. Example for TOP 20 NB-collocations of 10th order for „erklärte“ [explained]: sagte, schwärmte, lobt, schimpfte, meinte, jubelte, lobte, resümierte, schwärmt, Reinhard Heß, ärgerte, kommentierte, urteilte, analysierte, bilanzierte, freute, freute sich, Bundestrainer, freut ,gefreut [said, enthused, praises, grumbled, meant, was jubilant, praised, summarized, dreamt, Reinhard Hess, annoyed, commentated, judged, analyzed, balanced, made happy, was pleased, coach of the national team, is pleased, been pleased] Chris Biemann

8 Mapping co-occurrences to graphs
For all words having co-occurrences, form nodes in a graph. Connect them all by edges, initialize edge weight with 0 For every co-occurrence of two words in a sentence, increase edge weight by significance Chris Biemann

9 First Iteration Step The two black nodes A and B get connected in the step if there are many nodes C which are connected to both A and B The more Cs, the higher the weight of the new edge existing connection new connection Chris Biemann

10 Second Iteration Step The two black nodes A and B get connected in the step if there are many (dark grey) nodes Ds which are connected to both A and B. The connections between the nodes Ds and the nodes A and B were constructed because of (light gray) nodes Es and Fs, respectively Es Ds Fs former connection existing connection new connection B A Chris Biemann

11 Collapsing bridging nodes
Upper bound for path length in iteration n is 2n. However, some of the bridging nodes collapse, giving rise to self-keeping clusters of arbitrary path length, which are invariant under iteration. Upper 5 nodes: invariant cluster A, B are being absorbed by this cluster Chris Biemann

12 Examples of Iterated Co-occurrences
Order Reference word TOP-10 collocations N2 wine wines, champagne, beer, water, tea, coffee, Wine, alcoholic, beers, cider S10 wines, grape, sauvignon, chardonnay, noir, pinot, cabernet, spicy, bottle, grapes S1 ringing phone, bells, phones, hook, bell, endorsement, distinctive, ears, alarm, telephone S2 rung, Centrex, rang, phone, sounded, bell, ring, FaxxMaster, sound, tolled S4 sounded, rung, rang, tolled, tolling, sound, tone, toll, ring, doorbell pressing Ctrl, Shift, press, keypad, keys, key, keyboard, you, cursor, menu, PgDn, keyboards, numeric, Alt, Caps, CapsLock, NUMLOCK, NumLock, Scroll Chris Biemann

13 Intersection of Co-occurrence Sets: resolving ambiguity
Herz-Bube Becker bedient - folgenden - gereizt - Karo-Buben - Karo-Dame - Karo-König - Karte - Karten - Kreuz-Ass - Kreuz-Dame - Kreuz-Hand - Kreuz-König - legt - Mittelhand - Null ouvert - Pik - Pik-Ass - Pik-Dame - schmiert - Skat - spielt - Spielverlauf - sticht - übernimmt - zieht - Agassi - Australian Open - Bindewald - Boris - Break - Chang - Dickhaut - - gewann - Ivanisevic - Kafelnikow - Kiefer - Komljenovic - Leimen - Matchball - Michael Stich - Monte Carlo - Prinosil - Sieg - Spiel - spielen - Steeb - Teamchef Stich Achtelfinale - Aufschlag - Boris Becker - Daviscup - Doppel - DTB – Edberg - Finale - Graf - Haas - Halbfinale - Match - Pilic - Runde - Sampras - Satz - Tennis - Turnier - Viertelfinale - Weltrangliste - Wimbledon Alleinspieler - Herz - Herz-Dame - Herz-König - Hinterhand - Karo - Karo-As - Karo-Bube - Kreuz-As - Kreuz-Bube - Pik-As - Pik-Bube - Pik-König - Vorhand - Becker - Courier - Einzel - Elmshorn - French Open - Herz-As - ins - Kafelnikow - Karbacher - Krajicek - Kreuz-As - Kreuz-Bube - Michael Stich - Mittelhand - Pik-As - Pik-Bube - Pik-König Stich Chris Biemann

14 Example: NB-collocations of 2nd order warm, kühl, kalt
Disjunction and filtering for adjectives of collocation sets for warm, kühl, kalt [warm, cool, cold] results in: abgekühlt, aufgeheizt, eingefroren, erhitzt, erwärmt, gebrannt, gelagert, heiß, heruntergekühlt, verbrannt, wärmer [cooled down, heated, frozen, heated up, warms up, burned, stored, hot, down-cooled, burned, more warmly] emotional reading „abweisend“ [repelling] for kühl, kalt is eliminated Chris Biemann

15 Detection of X-onyms synonyms, antonyms, (co)-hyponyms...
Idea: Intersection of co-occurrence sets of two X-onyms as reference words should contain X-onyms lexical ambiguity of one reference word does not deteriorate the result set Method: - Detect word class for reference words - calculate co-occurrences for reference words - filter co-occurrences w.r.t the word class of the reference words (by means of POS tags) - perform disjunction of the co-occurrence sets - output result ranking can be realized over significance values of the co-occurrences Chris Biemann

16 Mini-Evaluation Experiments for different data sources, NB-collocations of 2nd and 3rd order fraction of X-onyms in TOP 5 higher than in TOP 10  ranking method makes sense disjunction of 2nd-order and 3rd-order collocations almost always empty  different orders exhibit different relations satisfactory quantity, more through larger corpora quality: for unsupervised extension not precise enough Chris Biemann

17 Word Sets for Thesaurus Expansion
Application: thesaurus expansion start set: [warm, kalt] [warm, cold] result set: [heiß, wärmer, kälter, erwärmt, gut, heißer, hoch, höher, niedriger, schlecht, frei] [hot, warmer, colder, warmed, good, hotter, high, higher, lower, bad, free] start set: [gelb, rot] [yellow, red] result set: [blau, grün, schwarz, grau, bunt, leuchtend, rötlich, braun, dunkel, rotbraun, weiß] [blue, green, black, grey, colorful, bright, reddish, brown, dark, red-brown, white] start set: [Mörder, Killer] [murderer, killer] result set: [Täter, Straftäter, Verbrecher, Kriegsverbrecher, Räuber, Terroristen, Mann, Mitglieder, Männer, Attentäter] [offender, delinquent, criminal, war criminal, robber, terrorists, man, members, men, assassin Chris Biemann

18 More Examples in English
Intersection of N2-Order collocation sets Chris Biemann

19 Questions? THANK YOU ! Chris Biemann


Download ppt "Chris Biemann, Stefan Bordag, Uwe Quasthoff"

Similar presentations


Ads by Google