Presentation is loading. Please wait.

Presentation is loading. Please wait.

Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts.

Similar presentations


Presentation on theme: "Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts."— Presentation transcript:

1 Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts. of General Linguistics & Computerlinguistics

2 CTF 07Horn & Rumpf: Conceptual noun types: grammar and automatic classification2 Structure 1.The four conceptual noun types and their contextual properties 2.Investigation of grammatical properties of the conceptual noun types on the basis of a German text corpus 3.A framework for the automatic classification of concept types 4.Conclusion

3 CTF 07Horn & Rumpf: Conceptual noun types: grammar and automatic classification3 Conceptual noun types inherently unique sortal SC rose, car, horse, house, table, noun individual IC pope, weather, proper names, sun, semantics relational relational RC sister, uncle, arm, leg, part functional FC mother, wife, size, weight, meaning Löbner (1979, 1985, 1998) Conceptual noun types differ according to their referential properties. Do they differ regarding their grammatical uses ? 1. The four conceptual noun types and their contextual properties

4 CTF 07Horn & Rumpf: Conceptual noun types: grammar and automatic classification4 Sortal concepts A rose is a nice present. Many roses are an even nicer present. Individual concepts The sun is burning. § A sun is burning. § The suns are burning. § Many suns are burning. § My sun is burning. / § The sun of mine is burning. § = use differing from underlying concept type Grammatical uses of conceptual noun types 1. The four conceptual noun types and their contextual properties

5 CTF 07Horn & Rumpf: Conceptual noun types: grammar and automatic classification5 Grammatical uses of conceptual noun types Relational concepts One of Marys legs is too short. § Marys leg is too short. / § The leg of Mary is too short. § Many legs of Mary are too short. Functional concepts Mary is Peters mother. / Mary is the mother of Peter. § Mary is a mother of Peter. § Mary is the mother. 1. The four conceptual noun types and their contextual properties

6 CTF 07Horn & Rumpf: Conceptual noun types: grammar and automatic classification6 Contextual properties of conceptual noun types grammatical characteristics possessive use: his mother / mother of him definiteness: the sun subcategorization: certain verbs require IC/FC as complements morphological properties: certain nouns are often functional deadjectival nouns (Intelligenz intelligence) deverbal nouns (Krümmung bend, Dauer length) compounds -wert value Bestwert optimum value -grad degree Wirkungsgrad degree of efficiency -größe size Kleidergröße dress size 1. The four conceptual noun types and their contextual properties

7 CTF 07Horn & Rumpf: Conceptual noun types: grammar and automatic classification7 2. Investigation of grammatical properties of the conceptual noun types on the basis of a German text corpus Goals: to identify the possible uses of the different concept types and their specific context features to develop and implement a method for the automatic classification of concept types in texts based on morphosyntactic features Hybrid approach: semantic and grammatical analysis of the conceptual noun types statistic investigation: automatic classification allows the processing of large amounts of data investigation is initially carried out on the basis of a German text corpus ( words) as a training corpus perspective: further research intended on English, French, Japanese 2. Investigation of grammatical properties

8 CTF 07Horn & Rumpf: Conceptual noun types: grammar and automatic classification8 Predictions Assumptions: The lexicalized concept type of a noun is the most frequently used type for each noun. Conceptual noun types occur particularly often in grammatical uses that match their underlying conceptual properties. –sortal concepts (rose): singular, plural, with quantifiers, indefinite... –individual concepts (sun): singular, definite –relational concepts (leg): indefinite, possessive –functional concepts (mother): singular, definite, possessive Other uses (type shifts) are still possible. The conditions under which these type shifts occur still have to be investigated. 2. Investigation of grammatical properties

9 CTF 07Horn & Rumpf: Conceptual noun types: grammar and automatic classification9 Counting (selection, definiteness) TokenType# totaldef. 1 sg.def. 1 pl.quant./indef. 2 sg. quant./indef. 2 pl. Ø 3 sg.Ø 3 pl. -n 4 -n-n-n-n-n NomenSC SemantikIC TeilRC BedeutungFC definite: def. determiner, poss. pron., gen. pron., d-Prep, d-selb, d-einzig, genitive deren/dessen), d-jen 2 quantifiers/indefinite: quantifiers, indefinite determiner, demonstratives, numbers, kein, d-beid, d-ord 3 null determiner 4 incl Investigation of grammatical properties

10 CTF 07Horn & Rumpf: Conceptual noun types: grammar and automatic classification10 Results Conceptual noun typesingular, definitepossessive Sortal conceptNomen (noun) (166)37 %0 % Relational conceptTeil (part) (124)36 %73 % Individual conceptSemantik (semantics) (152)82 %4 % Functional conceptBedeutung (meaning) (721)57 %74 % (selection) Results so far confirm our predictions. 2. Investigation of grammatical properties

11 CTF 07Horn & Rumpf: Conceptual noun types: grammar and automatic classification11 Tasks & Challenges Type shifts in certain readings The meaning of the word. (FC) The word bottle has many meanings. (RC) Generic and anaphoric uses The lightbulb was invented by Heinrich Göbel. (generic) Polysemy Analysis of possessive constructions, plurals, null determiner 2. Investigation of grammatical properties

12 CTF 07Horn & Rumpf: Conceptual noun types: grammar and automatic classification12 3. A framework for the automatic classification of concept types Architecture Training corpus Morphosyntactic analysis Training sample Computing classifiers Maximum entropy models Conclusion 3. A framework for the automatic classification of concept types

13 CTF 07Horn & Rumpf: Conceptual noun types: grammar and automatic classification13 Architecture of the framework morphosyntactical analysis training corpus training sample maximum entropy model msyn: dependency grammar parser extraction of relevant context features morphosyntactical analysis test corpus test sample manual annotation of concept types learning application Generalized Iterative Scaling annotated test korpus learning / application of a classifier 3. A framework for the automatic classification of concept types

14 CTF 07Horn & Rumpf: Conceptual noun types: grammar and automatic classification14 Training corpus Manually annotated version of Löbner (2003) Semantik Concept types of nouns marked with tags Die Semantik ist das Teilgebiet der Linguistik, das sich mit Bedeutung befasst. Diese Art von Definition mag vielleicht ihrem Freund genügen, der Sie zufällig mit diesem Buch in der Hand sieht und Sie fragt, was denn nun schon wieder sei, aber als Autor einer solchen Einführung muss ich natürlich präziser erklären, was der Gegenstand dieser Wissenschaft ist. 3. A framework for the automatic classification of concept types

15 CTF 07Horn & Rumpf: Conceptual noun types: grammar and automatic classification15 Morphosyntactical analysis We use Connexors msyn to analyse German texts. Syntactical information consists of dependency trees. Morphological features include part-of-speech, gender, number, case, time, mood and some more. Some postprocessing is done by ourselves, i.e. to add definitness markers. 3. A framework for the automatic classification of concept types

16 CTF 07Horn & Rumpf: Conceptual noun types: grammar and automatic classification16 Dependency tree main - ist subj - Semantik det - Die Def comp - Teilgebiet det - das Def det - der Def mod - Linguistik Gen possessor Die Semantik ist das Teilgebiet der Linguistik, … The semantics is that branch of linguistics 3. A framework for the automatic classification of concept types

17 CTF 07Horn & Rumpf: Conceptual noun types: grammar and automatic classification17 Output of Connexors msyn Die die det PREMOD DET Def FEM SG NOM Semantik semantik subj NH N FEM SG NOM ist sein main MAIN V IND PRES SG P3 das das det PREMOD DET Def NEU SG NOM Teilgebiet teil#gebiet comp NH N NEU SG NOM der die det PREMOD DET Def FEM SG GEN Linguistik linguistik mod NH N FEM SG GEN 3. A framework for the automatic classification of concept types

18 CTF 07Horn & Rumpf: Conceptual noun types: grammar and automatic classification18 training sample Extraction of relevant contextual features with regular expressions mapped on dependency trees with the programming language Perl. Results in pairs (concept type | list of context features): (f1, [tnr=2, tok=semantik, suff=ik, num=sg, art=def]) (r2, [tnr=5, tok=teilgebiet, num=sg, art=def, poss=rgen]) (f1, [tnr=7, tok=linguistik, suff=ik, num=sg, art=def]) (f2, [tnr=12, tok=bedeutung, suff=ung, num=sg, art=none]) (r2, [tnr=16, tok=art, num=sg, art=indef, poss=von]) (f2, [tnr=18, tok=definition, num=sg, art=none]) (r2, [tnr=22, tok=freund, num=sg, art=def]) (so, [tnr=30, tok=buch, num=sg, art=indef]) (r2, [tnr=33, tok=hand, num=sg, art=def]) (f2, [tnr=49, tok=autor, num=sg, art=none]) (r2, [tnr=52, tok=einführung, suff=ung, num=sg, art=indef]) (f2, [tnr=61, tok=gegenstand, num=sg, art=def]) 3. A framework for the automatic classification of concept types

19 CTF 07Horn & Rumpf: Conceptual noun types: grammar and automatic classification19 Automatic classification given: –training sample = {(a 1,b 1 ),…,(a n,b n )} –classes a i {f 1, f 2, r 1, r 2 } –contexts b i = {m 1,…,m m } –features m i {art=def, art=indef, poss=lgen, …} searched: –classifier p(a|b) How probable is class a given context b ? –maximal argument a = arg max a p(a|b) Which is the most probable class a given context b ? 3. A framework for the automatic classification of concept types

20 CTF 07Horn & Rumpf: Conceptual noun types: grammar and automatic classification20 Computing a (bad) classifier simplest account: –Counting coocurrences of classes and contexts: shortcomings: –Only the contexts in are learned. –Varying degrees of evidence of single features are disregarded. way out: –Computation of the classifier with a maximimum entropy model. 3. A framework for the automatic classification of concept types

21 CTF 07Horn & Rumpf: Conceptual noun types: grammar and automatic classification21 Maximum entropy models Basics –Entropy: number of bits required to encode events of a particular type (tossing a coin: 1 bit, rolling a die: 2 ½ Bit). –Principle of maximum entropy: choose a model with maximum entropy, i.e. dont go beyond the data. Specific features –Decompositon of contexts into single context features or their combination. –Possibility to combine features from heterogenous sources (e.g. syntax, semantics, morphology, …). –Computation of the weights (evidence) of single features or their combination for every class over all contexts. 3. A framework for the automatic classification of concept types

22 CTF 07Horn & Rumpf: Conceptual noun types: grammar and automatic classification22 Contextual and binary features The weights for contextual features are determined indirectly with binary features. These relate classes and contextual features. –simple binary featuresexample instance –complex binary featuresexample instance 3. A framework for the automatic classification of concept types

23 CTF 07Horn & Rumpf: Conceptual noun types: grammar and automatic classification23 Maximum entropy framework where j > 0 is a wheight for feature f j, k is the total number of binary features, and Z(b) is a normalization constant to ensure that a p(a|b) = 1 resp. 100% cf. Ratnaparkhi A framework for the automatic classification of concept types

24 CTF 07Horn & Rumpf: Conceptual noun types: grammar and automatic classification24 Generalized Iterative Scaling Unfortunately, there is no analytical method to determine the weights. There are some iterative approximation algorithms to determine the, which converge to a correct p(a|b) and respect the principle of maximum entropy. We use Generalized Iterative Scaling (GIS): is the expectation value for feature f j in the training corpus is the expectation value for feature f j in the previous iteration The constant C is the total number of active binary features over all contexts. initialization iteration 3. A framework for the automatic classification of concept types

25 CTF 07Horn & Rumpf: Conceptual noun types: grammar and automatic classification25 Conclusion The investigations so far support the assumption that the referential properties of the concept types match their grammatical uses. The maximum entropy framework allows a fine grained analysis of the evidence contributed by a single context feature to the classification. The selection of relevant features is essential for the success of the automatic classification. Our research objective consists to a great deal in the examination of this features. We start experiments with complex features to model combined evidence of context features. 4. Conclusion


Download ppt "Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts."

Similar presentations


Ads by Google