Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Automatic Extension of Feature-based Semantic Lexicons via Contextual Features March 10, 2005 29th Annual Conference of Gfkl, 2005 Chris Biemann University.

Similar presentations


Presentation on theme: "1 Automatic Extension of Feature-based Semantic Lexicons via Contextual Features March 10, 2005 29th Annual Conference of Gfkl, 2005 Chris Biemann University."— Presentation transcript:

1 1 Automatic Extension of Feature-based Semantic Lexicons via Contextual Features March 10, 2005 29th Annual Conference of Gfkl, 2005 Chris Biemann University of Leipzig Germany Rainer Osswald FernUniversität Hagen Germany

2 2 Outline Motivation: Lexicon extension for Semantic Parsing From co-ocurrences to adjective profiles of nouns Inheritance mechanism for semantic features Results for complex classes Results for binary classes and their combination Discussion

3 3 Motivation Semantic parsing aims at finding a semantic representation for a sentence Semantic parsing needs as a prerequisite semantic features of words. Semantic features are obtained by manually creating lexicon entries (expensive in terms of time and money) Given a certain amount of manually created lexicon entries, it might be possible to train a classifier in order to find more entries

4 4 HaGenLex: Semantic Lexicon for German semantic class size: 22700 entries of these: 11300 nouns, 6700 verbs WORDSEMANTIC CLASS Aggressivitätnonment-dyn-abs-situation Agonienonment-stat-abs-situation Agrarproduktnat-discrete Ägypterhuman-object Ahnhuman-object Ahndungnonment-dyn-abs-situation Ähnlichkeitrelation Airbagnonax-mov-art-discrete Airbusmov-nonanimate-con-potag Airportart-con-geogr Ajatollahhuman-object Akademikerhuman-object Akademisierungnonment-dyn-abs-situation Akkordeonnonax-mov-art-discrete Akkreditierungnonment-dyn-abs-situation Akkuax-mov-art-discrete Akquisitionnonment-dyn-abs-situation Akrobathuman-object...

5 5 Characteristics of semantic classes in HaGenLex In total 50 semantic classes for nouns are constructed from allowed combinations of: 16 semantic features (binary), e.g. HUMAN+, ARTIFICIAL- 17 ontologic sorts, e.g. concrete, abstract-situation... sort (hierarchy) semantic features semantic classes

6 6 Application: WOCADI-Parser Welche Bücher von Peter Jackson über Expertensysteme wurden bei Addison-Wesley seit 1985 veröffentlicht?

7 7 Underlying Assumptions Harris 1968: Distributional Hypothesis semantic similarity is a function over global contexts of words. The more similar the contexts, the more similar the words Projected on nouns and adjectives: nouns of similar semantic classes are modified through similar adjectives The neighbouring co-occurrence relation between adjectives as left neighbours and nouns as right neighbours approximates typical head-modifier structures

8 8 Neighbouring Co-occurrences and Profiles Significant co-occurrences reflect relations between words. To determine, which are significant, a significance measure is used (here log-likelihood) In the following, we look at adjectives which appear significantly (speak: typically) left to nouns and nouns appearing significantly right of adjectives The set of adjectives that co-occur significantly often to the left of a noun is called ist adjective profile (analogous definition of noun profile for adjectives) For experiments, we use the most recent German corpus of Projekt Deutscher Wortschatz, 500 million tokens

9 9 Example: neighbouring profiles amount: 125000 nouns, 25000 adjectives wordadjektiv / noun profile Buchneu, erschienen, erst, neuest, jüngst, gut, geschrieben, letzt, zweit, vorliegend, gleichnamig herausgegeben, nächst, dick, veröffentlicht,... Käsegerieben, überbacken, kleinkariert, fett, französisch, fettarm, löchrig, holländisch, handgemacht, grün, würzig, selbstgemacht, produziert, schimmelig, Camembertgebacken, fettarm, reif überbackenSchweinesteak, Aubergine, Blumenkohl, Käse erlegtTier, Wild, Reh, Stück, Beute, Großwild, Wildkatzen, Büffel, Rehbock, Beutetier, Wal, Hirsch, Hase, Grizzly, Wildschwein, Thier, Eber, Bär, Mücke, ganzLeben, Bündel, Stück, Volk, Wesen, Vermögen, Herz, Heer, Arsenal, Dorf, Land, Können, Berufsleben, Paket, Kapitel, Stadtviertel, Rudel, Jahrzehnt,... Word transl.adjektive / noun profile translations booknew, published, first, newest, most recent, recently, good, written, last, second, onhand, eponymous, next, thick,... cheesegrated, baked over, small minded, fat, French, low-fat, holey, Dutch, hand-made, green, spicey, self-made, produced, moldy camembertbaken, low-fat, ripe baked oversteak, aubergine, cauliflower, cheese brought downanimal, game, deer, piece, prey, big game, wild cat, buffalo, roebuck, prey animal, whale, hart, bunny, grizzly, wild pig, boar, bear,... wholelife, bundle, piece, population, kind, fortune, heart, army, anrsenal, village, country, ability, career, packet, chapter, quater, pack, decade...

10 10 Mechanism of Inheritance Algorithm: Initialize adjective and noun profiles; Initialize the start set; As long as new nouns get classified { calculate class probabilities for each adjective; for all yet unclassified nouns n { Multiply class probabilities per class of modifying adjectives; Assign the class with highest probabilities to n; } Which class is assigned to N4 in the next step? Class probabilities per adjective: count number of classes normalize on total number of class wrt. noun classes normalize to 1

11 11 Example: Topf (pot) adjektive profile of Topf (pot) = ax-mov-art-discrete: angebrannt(X) heiß(-) ehern(-) fremd(-) divers(-) zerbeult(X) brodelnd(-) staatlich(-) gußeisern(-) tönern(X) gemeinsam(-) groß(-) irden(X) verschieden(-) verschlossen(-) anonym(-) rund(-) flach(-) Bremer(-) geschlossen(-) passend(-) gesondert(-) andere(-) riesig(-) Golden(-) eisern(-) europäisch(-) viel(-) öffentlich(-) mehr(-) golden(-) leer(-) klein(-) getrennt(-) möglich(-) speziell(-) übervoll(X) dampfend(-) gleich(-) gefüllt(-) # classes per adjective: angebrannt (burnt): {nat-substance=1, art-substance=1, ax-mov-art-discrete=1} Suppe (soup)art_substance Zigarette (cigarette)ax-mov-art-discrete Milch(milk)nat-substance zerbeult (dented): {nonmov-art-discrete=1, mov-nonanimate-con-potag=2, nonax-mov-art-discrete=1, ax-mov-art-discrete=3} Wagen, Auto (wagon, car)mov-nonanimate-con-potag Fahrzeug, Mountainbike, Posaune (vehicle, mountainbike, trombone)ax-mov-art-discrete Mantel (coat) nonax-mov-art-discrete Dach (roof)nonmov-art-discrete irden (earthen): {art-con-geogr=1, nonax-mov-art-discrete=1, ax-mov-art-discrete=9} Schal(shawl)nonax-mov-art-discrete Hafen (port)art-con-geogr Teller, Flasche, Schüssel, Becher, Geschirr, Vase, Krug, Gefäß, Napf (plate, bottle, bowl, cup, dishes, vase, mug, jar) ax-mov-art-discrete tönern (clay-made): {ax-mov-art-discrete=1, prot-discrete=1} Fuß (foot)prot-discrete Gefäß (mug) ax-mov-art-discrete übervoll (over-filled): {nonmov-art-discrete=3, art-con-geogr=1, nonment-dyn-abbs-situation=1, nonax-mov-art-discrete=1} Zimmer, Saal, Lager (room, hall, encempment)nonmov-art-discrete Stall (stable)art-con-geogr Vorlesung (lecture)nonment-dyn-abs-situation Tablett (tray)nonax-mov-art-discrete Class probabilities: {mov-nonanimate-con-potag=2.8E-25, ax-mov-art-discrete=5.8E-8, art-con-geogr=1.5E-20, nonax-mov-art-discrete=2.1E-15, nat-substance=3.3E-25, nonment-dyn-abs-situation=1.6E-25, prot-discrete=5.0E-25, art-substance=3.3E-25, nonmov-art-discrete=7.1E-20}

12 12 Parameters Minimal number of adjectives: minAdj A noun needs at least minAdj classifying adjectives avoids statistical noise and implies frequency threshold. Maximal number of classes per adjective: maxClass An adjective is only used for classification if it favours maximally maxClass different classes unspecific adjectives do not distort the results

13 13 Experimental Data 4726 nouns comply to minAdj=5, that means maximal recall=78,2% In all experiments, 10-fold-cross validation was used

14 14 Results global classification Classification was carried out directly on 50 semantic classes Different measuring points correspond to parameters minAdj in {5,10,15,20}, maxClass in {2, 5, 50} Results too poor for lexicon extension

15 15 Combining single classifiers Architecture: binary classifiers for single features, then combinding the outcome. Parameter: minAdj=5, maxClass=2 ANIMAL +/- ANIMATE +/- ARTIF +/- AXIAL +/-... (16 features)... (17 sorts) ab +/- abs +/- ad +/- as +/- Selection: compatible semantic classes that are minimal w.r.t hierarchy and unambiguous. result class or reject

16 16 Results: single semantic features for bias >0,05 good to excellent precision total precision: 93,8% (86,8% for feature +) total recall: 70,7% (69,2% for feature +) NameAnzahl+ -Bias method60041259920,0020 instit60323959930,0065 mental900816288460,0180 info601511958960,0198 animal599514358520,0239 geogr601518858270,0313 thconc602851855100,0859 instru593296949630,1634 human5995131346820,2190 legper6009135246570,2250 animate6010150545050,2504 potag6015166443510,2766 artif5864220436600,3759 axial5892226036320,3836 movable5827234534820,4024 spatial6033291031230,4823

17 17 Results: ontologic sorts for bias >0,10 good to excellent precision total precision: 94,1% (89,5% for sort +) total recall: 73,6% (69,6% for sort +) NameAnzahl+ -Bias re6033760260,0012 mo6033860250,0013 o-60335994390,0065 oa60454160040,0068 me60454160040,0068 qn60454160040,0068 ta603310759260,0177 s601022457860,0373 as603136356680,0602 na603341156220,0681 at603345055830,0746 io603366453690,1101 ad6031148145500,2456 abs6033184641870,3060 d6010266333470,4431 co6033291031230,4823 ab-6033308229510,4891

18 18 Results: comb. semantic classes no connection between amount of class and results visible total precision: 80,2% total recall: 34,2%, number of newly classified nouns: 6649 Klasse Anz.PrecRec nonment-dyn-abs-situation1421 89,1934,27 human-object1313 96,8269,54 prot-theor-concept516 53,7118,22 nonoper-attribute411 0,00 ax-mov-art-discrete362 55,6440,88 nonment-stat-abs-situation226 36,846,19 animal-object143 100,026,57 nonmov-art-discrete133 57,4123,31 ment-stat-abs-situation126 51,2815,87 nonax-mov-art-discrete108 31,4815,74 tem-abstractum107 96,7728,04 mov-nonanimate-con-potag98 70,4531,63 art-con-geogr96 58,7028,12 abs-info94 42,3111,70 art-substance88 60,4729,55 nat-discrete88 100,031,82 nat-substance86 57,149,30 prot-discrete73 100,057,53 nat-con-geogr63 65,0020,63 prot-substance50 100,040,00 mov-art-discrete45 100,037,78 meas-unit41 90,9124,39 oper-attribute39 0,00 Institution39 0,00 ment-dyn-abs-situation36 0,00 plant-object34 100,08,82 mov-nat-discrete27 22,22 con-info25 40,008,00 Rest157 39,2419,75

19 19 Typical mistakes Pflanze (plant) animal-object instead of plant-object zart, fleischfressend, fressend, verändert, genmanipuliert, transgen, exotisch, selten, giftig, stinkend, wachsend... Nachwuchs (offspring) human-object instead of animal-object wissenschaftlich, qualifiziert, akademisch, eigen, talentiert, weiblich, hoffnungsvoll, geeignet, begabt, journalistisch... Café (café) art-con-geogr instead of nonmov-art-discrete (cf. Restaurant) Wiener, klein, türkisch, kurdisch, romanisch, cyber, philosophisch, besucht, traditionsreich, schnieke, gutbesucht,... Neger (negro) animal-object instead of human-object weiß, dreckig, gefangen, faul, alt, schwarz, nackt, lieb, gut, brav but: Skinhead (skinhead) human-object (ok) {16,17,18,19,20,21,22,23,30}ährig, gleichaltrig, zusammengeprügelt, rechtsradikal, brutal In most cases the wrong class is semantically close. Evaluation metrics did not account for that.

20 20 Any Questions? Thank you very much!


Download ppt "1 Automatic Extension of Feature-based Semantic Lexicons via Contextual Features March 10, 2005 29th Annual Conference of Gfkl, 2005 Chris Biemann University."

Similar presentations


Ads by Google