Knowledge Summarization using Decision Logic Case Study: Polish Gender Theory Paweł STACEWICZ & André WLODARCZYK.

Slides:



Advertisements
Similar presentations
Artificial Intelligence
Advertisements

Charting the Potential of Description Logic for the Generation of Referring Expression SELLC, Guangzhou, Dec Yuan Ren, Kees van Deemter and Jeff.
Symbolic and statistical Analyses of meta-data using the Semana platform a bundle of tools for the KDD research Georges Sauvet (CNRS, Toulouse) Centre.
Artificial Intelligence Chapter 13 The Propositional Calculus Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
SEMANA MORPHOLOGICAL DATA EXPLORATION USING THE SEMANA PLATFORM Feature Granularity Problem in the Definition of Polish Gender Georges SAUVET UTAH - CREAP,
_ Rough Sets. Basic Concepts of Rough Sets _ Information/Decision Systems (Tables) _ Indiscernibility _ Set Approximation _ Reducts and Core _ Rough Membership.
Mutidimensional Data Analysis Growth of big databases requires important data processing.  Need for having methods allowing to extract this information.
Automated Reasoning Systems For first order Predicate Logic.
November 12, 2013Computer Vision Lecture 12: Texture 1Signature Another popular method of representing shape is called the signature. In order to compute.
The NOUN 1 General characteristics and classification
Logic Concepts Lecture Module 11.
Simplifying CFGs There are several ways in which context-free grammars can be simplified. One natural way is to eliminate useless symbols those that cannot.
Mid-Term Review Tobi England Mid-Term Review Tobi England.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
Chapter 12: Expert Systems Design Examples
Formal Logic Proof Methods Direct Proof / Natural Deduction Conditional Proof (Implication Introduction) Reductio ad Absurdum Resolution Refutation.
System Concepts for Process Modeling  Process Concepts  Process Logic  Decomposition diagrams and data flow diagrams will prove very effective tools.
Chapter 10 The Analysis of Frequencies. The expression “cross partition” refers to an abstract process of set theory. When the cross partition idea is.
Syntax Lecture 4.
Week 2a. Morphosyntactic features, part II. Ch. 2, 4.2- CAS LX 522 Syntax I.
Let remember from the previous lesson what is Knowledge representation
Kendall & KendallCopyright © 2014 Pearson Education, Inc. Publishing as Prentice Hall 9 Kendall & Kendall Systems Analysis and Design, 9e Process Specifications.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Outline of English Syntax.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Copyright © Cengage Learning. All rights reserved. CHAPTER 2 THE LOGIC OF COMPOUND STATEMENTS THE LOGIC OF COMPOUND STATEMENTS.
Chapter 7 Reasoning about Knowledge by Neha Saxena Id: 13 CS 267.
Copyright © Cengage Learning. All rights reserved.
Logic Gates Circuits to manipulate 0’s and 1’s. 0’s and 1’s used for numbers Also to make decisions within the computer. In that context, 1 corresponds.
Linguistics 101: Review Gareth Price. New Site for Powerpoints
Intro to Discrete Structures
Third Declension Magister Riggs. Third Declension Third Declension Latin Nouns written by: John Garger edited by: Tricia Goss updated: 12/7/2011 The third.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Psycholinguistic Theory
A Procedural Model of Language Understanding Terry Winograd in Schank and Colby, eds., Computer Models of Thought and Language, Freeman, 1973 발표자 : 소길자.
 (Worse) It is a fact that engineers select an appropriate variable and the transformed observations are treated as though they are normally distributed.
Head-driven Phrase Structure Grammar (HPSG)
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7-1 Review and Preview.
PROCESSING OF DATA The collected data in research is processed and analyzed to come to some conclusions or to verify the hypothesis made. Processing of.
Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.
Key Concepts Representation Inference Semantics Discourse Pragmatics Computation.
LDK R Logics for Data and Knowledge Representation Propositional Logic: Reasoning First version by Alessandro Agostini and Fausto Giunchiglia Second version.
1.  Interpretation refers to the task of drawing inferences from the collected facts after an analytical and/or experimental study.  The task of interpretation.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
© Copyright McGraw-Hill 2004
ece 627 intelligent web: ontology and beyond
General characteristics As any other part of speech, the noun can be characterized by three criteria:  Semantic (the meaning)  Morphological (the form.
5 Lecture in math Predicates Induction Combinatorics.
ENGLISH LANGUAGE – 2° YEAR A HISTORY OF THE ENGLISH LANGUAGE Annalisa Federici, Ph.D. Textbook: J. Culpeper, History of English, Routledge (unit.
System and the axis of Choice  Systems are list of choices which are available in the grammar of a language.  It could be a list of things b/w which.
 Three grammatical categories are represented in the OE substantives, just as in many other Germanic and Indo-European languages: gender, number, and.
IT 5433 LM3 Relational Data Model. Learning Objectives: List the 5 properties of relations List the properties of a candidate key, primary key and foreign.
Artificial Intelligence Logical Agents Chapter 7.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
The theory of word classes in modern grammar studies
Logic of Hypothesis Testing
Logic.
Dummett about realism and anti-realism
Morphology Morphology Morphology Dr. Amal AlSaikhan Morphology.
The Propositional Calculus
Макет заголовкаМакет заголовка Підзаголовок. The noun is the central lexical unit of language. It is the main nominative unit of speech. As any other.
Lecture 2 The Relational Model
Getting started with Sanskrit grammar
Logics for Data and Knowledge Representation
Back to “Serious” Topics…
The Logic of Declarative Statements
Copyright © Cengage Learning. All rights reserved.
Implementation of Learning Systems
Cambridge Latin Course Unit 2, Stage 18
Generalized Diagnostics with the Non-Axiomatic Reasoning System (NARS)
Presentation transcript:

Knowledge Summarization using Decision Logic Case Study: Polish Gender Theory Paweł STACEWICZ & André WLODARCZYK

ABOUT DECISION LOGIC Decision Logic (DL) was proposed by Zdzisław Pawlak as a formal tool, that connects his theory of rough sets (RST) with concept of reasoning in knowledge representation systems (KRS). [Pawlak Z., 1991] The main concern od DL is induction (not deduction), i.e. this tool is best suited for discovering dependencies in data and reduction of knowledge. Atomic formulas of DL has the form (a,v) or a v, meaning that attribute a of object under observation has value v. Compound formulas of DL are built of atomic formulas and common logic connectives, like „  ”, „  ” and „→”. From inductive point of view the most important formulas are decision rules, which takes the form p 1  p 2  …  p n →q (where p i and q are atomic formulas). Set of decision rules is called decision alghoritm.

DECISION TABLES Decision tables are the most clear representation of decision rules sets (i.e. decision algorithms). Each row of such a table corresponds to one rule and each column corresponds to one attribute of objects under observation. Set of columns is divided into two categories: condition columns, which corresponds to attributes in predecessors of rules, and decision columns, which corresponds to attributes in successors of rules. Sample decision table Corresponding set of rules (decision algorithm) abcd x1x x2x x3x x4x r 1 : a 1  b 0  c 1 →d 0 r 2 : a 1  b 2  c 0 →d 0 r 3 : a 0  b 1  c 1 →d 1 r 4 : a 1  b 1  c 1 →d 1

DECISION LOGIC – SOME EXAMPLES Decision table (model) abcd x1x x2x x3x x4x Sample formulas a 1, b 0, d 0 (atomic formulas) a 0  b 1, a 1  b 2  c 1 (conjunctions) a 0  b 1, a 1  b 2  c 1 (alternatives) a 1  b 1  c 1 →d 1 (implication, i.e. decision rule) a 1  (a 0  c 1 )  (b 0  c 0 )→d 0 (disjunctive decision rule) Meanings (extensions) of formulas | a 1 | = {x 1,x 2 }, | b 0 | = {x 1 } | a 0  b 1 | = {x 3 }, | a 1  b 2  c 1 | = {x 1,x 2,x 3,x 4 } | a 1  b 1  c 1 →d 1 | = {x 1,x 2,x 3,x 4 } Note: Two last formulas are satisfied by all objects x i and from this reason are said to be true in model (i.e. in respect to observations collected in decision table).

SIMPLIFICATION OF KNOWLEDGE IDEA One of the most interesting applications of DL is simplification of given rule sets in such a manner that final set of rules has the same „ decision-making strength ” as initial set. The main goal of simplification is to maximally reduce number of rules and number of components in predecessors of rules.

SIMPLIFICATION OF KNOWLEDGE PROCEDURE Step 1: ELIMINATION OF SUPERFLUOUS ATTRIBUTES (Deletion of decision table columns)  Find all the superfluous attributes in R, the core of R and the reducts of R  Choose one of the reducts and limit the subsequent steps to this reduct. Step 2: SIMPLIFYING OF SUBSEQUENT RULES (Deletion of some row entries)  Find the core of each rule r i (set of necessary attributes).  For each rule r i find the set of its reducts RED(r i )={r i1, r i2 … }. Step 3: SIMPLIFYING FINAL (DISJUNCTIVE) RULES (Merging rows, and then deletion of certain merged row components)  For each combination of decision attribute values create one final disjunctive rule.  Find the reducts of all the disjunctive rules.

EXAMPLE – Initial knowledge abcde x1x x2x x3x x4x x5x x6x x7x Initial decision table T Corresponding set of rules R r 1 : a 1  b 0  c 1  d 1 →e 1 r 2 : a 1  b 0  c 0  d 0 →e 1 r 3 : a 0  b 0  c 0  d 0 →e 0 r 4 : a 1  b 1  c 0  d 1 →e 0 r 5 : a 1  b 1  c 0  d 2 →e 2 r 6 : a 2  b 2  c 0  d 2 →e 2 r 7 : a 2  b 2  c 2  d 2 →e 2 Our goal: maximally reduce number of rules and number of components in the predecessors.

STEP 1 – Simplification of the Set of Rules Explanation. Attribute c is superfluous (the column c can be deleted), because all the decision rules without the attribute c are true (after deletion of c-formulas from rules there will be no inconsistent rules in new rule set). Superfluous attributes: c Necessary attributes: a, b, d CORE(R)={a,b,d} RED(R)={a,b,d} New decision table T’ abde x1x x2x x3x x4x x5x x6x x7x7 2222

STEP 2 – Simplification of Rules Table with superfluous attributes, cores and reducts of subsequent rules Table of rule ’ s reducts r ij (r ij means j-th reduct of r i ; cores are red) RulesSuperfluous attr.CoresReducts r1r1 a, d{b}{b,a}, {b,d} r2r2 b, d{a}{a,b}, {a,d} r3r3 b, d{a} r4r4 a{b,d} r5r5 a, b{d} r6r6 a, b, dnone{a}, {b}, {d} Reducts of r i abde r 11 10x1 r 12 x011 r 21 10x1 r 22 1x01 r 31 0xx0 Reducts of r i abde r 41 x110 r 51 xx22 r 61 2xx2 r 62 x2x2 r 63 xx22

STEP 3 – Simplification of Final Rules Each final rule has the form of z i : p i  q i, where q i is one of the successors of intermediate rules r ij (i.e. e 0, e 1 or e 2 ) and p i is disjunction of all predecessors of rules r ij with succesor q i. To simplify the final rule, i.e. to find its reduct, we need to delete from its predecessor all the superfluous components of the disjunction p i. During this procedure we have to specify the meanings of different formulas f, i.e. sets | f |. Below we list subsequent final rules z i and reducts of this rules. Final rule z 1 : p 1  q 1, that is (a 1  b 0 )  (b 0  d 1 )  (a 1  d 0 )  e 1 Successor of rule: q 1 = e 1, | q 1 |={x 1,x 2 } Predecessor of rule: p 1 =(a 1  b 0 )  (b 0  d 1 )  (a 1  d 0 ), | p 1 |={x 1,x 2 } p 11 =(a 1  b 0 ), | p 11 |={x 1,x 2 } p 12 =(b 0  d 1 ), | p 12 |={x 2 } p 13 =(a 1  d 0 ), | p 13 |={x 2 } Superfluous components: {p 12,p 13 } Necessary components: {p 11 } Reduct of rule: p 11  q 1, that is a 1  b 0  e 1

STEP 3 – Continuation Final rule z 2 : p 2  q 2, that is a 0  (b 1  d 1 )  e 0 Successor of rule: q 2 = e 0, | q 2 |={x 3,x 4 } Predecessor of rule: p 2 = a 0  (b 1  d 1 ), | p 2 |={x 3,x 4 } p 21 =a 0, | p 21 |={x 3 } p 22 =(b 1  d 1 ), | p 22 |={x 4 } Superfluous components: none Necessary components: {p 21,p 22 } Reduct of rule: p 21  p 22  q 2, that is a 0  (b 1  d 1 )  e 0 (reduction didn ’ t occure) Final rule z 3 : p 3  q 3, that is a 2  b 2  d 2  e 2 Successor of rule: q 3 = e 2, | q 3 |={x 5,x 6,x 7 } Predecessor of rule: p 3 = a 2  b 2  d 2, | p 3 |={x 5,x 6,x 7 } p 31 =a 2, | p 31 |={x 6,x 7 } p 32 =b 2, | p 32 |={x 6,x 7 } p 33 =d 2, | p 33 |={ x 5,x 6,x 7 } Superfluous components: {p 31,p 32 } Necessary components: {p 33 } Reduct of rule: p 33  q 3, that is d 2  e 2

RESULT – Simplified Set od Rules Finally we obtain three rules instead of seven. Each rule contains, besides second rule that is disjunctive, less atomic formulas than original rules. a 1  b 0  e 1 a 0  (b 1  d 1 )  e 0 d 2  e 2 New rules:

STUDY CASE Why Polish Adjective Declension ? Answer: Polish Adjective Declension is an application domain with a well-defined borderline; i.e.: in which the total function generates all the combinatory possibilities.

Case= {Nominative, Accusative, Genitive, Dative, Instrumental, Locative} Number= {singular, plural} Gender= {masculine, feminine, neuter, X, Y, Z*} POLISH DECLENSION In Polish School Grammar, the Adjective declension consists in amalgamation of 3 “morphological categories”. In our experimentation, we interpreted these categories as attributes of an information system. (Rough Set Theory, Pawlak Z., 1982) * X, Y, Z will be analyzed in the sequel.

THE PROBLEM OF GENDER IN POLISH In Slavic languages, Gender is a classificatory category as for Nouns while it is an inflectional category as for Adjectives. In order elucidate the problem of Gender in Polish noun morphology, we built a database of usages (not uses) of the proximal deictic adjectives. The root of these adjectives is very short: one single phoneme t-.

THE DEICTIC MORPHEMES IN POLISH The Nominative form of Polish morphemes with proximal (with respect to the speaker) deictic meaning are: TEN, TA, TO They correspond to : TENTATO Englishthis Frenchcecettece Germandieserdiesedieses Japanesekono

SAMPLES FROM OUR DATABASE Some samples from the db (examples only in the Nominative case) PolishEnglish translation SingularPlural Feminine ta deskate deskithis/these board(s) ta gęśte gęsithis/these goose/geese ta panite paniethis/these lady/ladies Masculine ten domte domythis/these house(s) ten pieste psythis/these dog(s) ten panci panowiethis/these sir(s) Neuter to pi ó rote pi ó rathis/these feather(s) to kurczęte kurczętathis/these chicken(s) to dzieckote dziecithis/these child/children Our database contains 108 different noun phrases totally combining all the categories involved in the declension: Case, Number, Gender and Animacy)

Defining Gender in Polish 7 “Genders” In Polish Linguistics (cf. SALONI, Z. 1976), Gender is defined as a morpho-syntactic category. It is in the Accusative Case that Gender forms of Polish Adjectives are mostly differentiated. Sub-genders are distinguished in singular and in plural. Doing so, surprisingly, up to 7 gender classes have been proposed : * “ Animal ” corresponds to the feature “ animate ” in other European languages descriptions. ** “ Personal ” corresponds to the feature “ human ”. *** Pluralia tantum are defective nouns with no singular form). Singular : 1.feminine (with a specific Accusative form) 2.neuter (with the same form in Accusative as in Nominative) 3.animal* masculine (with the same form in Accusative as in Genitive) 4.non animal masculine (with the same form in Accusative as in Nominative) Plural : 1.personal** masculine (with the same form in Accusative as in Genitive), 2.non personal masculine (with the same form in Accusative as in Nominative) 3.“pluralia tantum”*** (with the same form in Accusative as in Nominative)

Defining Gender in Polish 5 “Genders” In fact, Saloni’s theory derives from that of Mańczak, W. (1956) who distinguished the following five “sub- genders” only : 1.personal masculine 2.animal masculine 3.non animal masculine 4.feminine 5.neuter

DATABASE WITH 7 GENDERS Nb of objects : 108 Nb of duplicates : 65 Nb of attributes : 3 (with respectively 2, 7, 6 values) Nb->{plur or sing} Gnd->{fem or mascAn or mascHum or mascInan or neu or nMasHum or plTant} Case->{A or D or G or I or L or N} Theoretical Combinations : 84 Apparent Saturation Index : 51.19% Non Attested Pairs of Values (10) If all non-attested pairs are inconsistent, the maximum number of combinations is : 54 Corrected Saturation Index : 79.63% Our knowledge reduction algorithm cannot reduce the different descriptions. Instead 45 decision rules are proposed.

CRITICAL REMARKS ON SUB-GENDERS We observed that the 5, 6, 8 or 9 “ sub-genders ” of Polish School Grammars (a) neither correspond to any known semantic or ontological categories (b) nor to any known grammatical sub- gender in other languages. In inflectional languages, morphological amalgamation of several different categories in one single form may be the source of difficulties in discerning properly the semantic categories in question.

ANALYSIS of GENDER SUBCATEGORIZATION in POLISH GRAMMAR

DB building Using our “Dynamic db Builder”… morpheme sample attribute, value (features chosen for each entry)

Multi-valued Contingency Table The 108 samples are collected into a Multi-valued Contingency Table

FIRST TRIAL SPLITTING GENDER Observing the singular/plural oppositions in Adjective declension, we first divided the 7 “ sub-genders ” valued Gender attribute into 3 attributes : gender = {feminine, neuter, masculine) animacy = {animate, inanimate} humanity = {human, non human} We split the 7 “ sub-genders ” -valued Gender attribute into more than one attribute (with less values each).

FIRST TRIAL - RESULTS SPLITTING GENDER Objects : 108 Duplicates : 0 Duplicate ratio : 0% The following pairs of attributes could be merged: [HUM|INA] Confidence index = 99.9% [HUM|nHUM]Confidence index = 99.9% [INA|nHUM]Confidence index = 99.9% Attributes : 5 (with resp. 6,2,3,2,2 values) case, number, gender, animacy and humanity Theoretical Combinations : 144 Apparent Saturation Index : 75% Non-Attested Pairs of Values (1) If all non-attested pairs were inconsistent, the maximum number of combinations would be: 108 Corrected Saturation Index : 100% ====================================================== Non Attested Pairs of Values (1) inanimate, human, 2, 4 Our knowledge reduction algorithm reduces the 108 different descriptions to 34 decision rules.

SECOND TRIAL MERGING ANIMACY with HUMANITY Considering the results of the first trial - one pair of values ( inanimate and human ) being not attested in the db (in fact, this pair is clearly contradictory) Non Attested Pairs of Values (1) inanimate, human, 2, 4 - and the confidence indices being computed as below The following pairs of attributes could be merged: [HUM|INA]Confidence index = 99.9% [HUM|nHUM]Confidence index = 99.9% [INA|nHUM]Confidence index = 99.9% we decided to merge both binary attributes ANIMACY with HUMANITY into one three-valued attribute as follows : ANIMACY-*-{ANY}=[nhuman|inanimate|human]

SECOND TRIAL - RESULTS MERGING ANIMACY with HUMANITY Nb of objects : 108 Nb of duplicates : 0 Nb of attributes : 4 (with respectively 2, 3, 3 and 6 values) Nb-->{plur or sing} Gnd-->{fem or masc or neu } Anim--> {inanim or anim or animHum} Case-->{A or D or G or I or L or N} Duplicate ratio : 0% Theoretical Combinations : 108 Apparent Saturation Index : 100% Non-Attested Pairs of Values (0) Corrected Saturation Index : 100% Again our knowledge reduction algorithm reduces the 108 different descriptions to 34 decision rules.

Establishing an ANIMACY CATEGORY for Polish Grammar

KNOWLEDGE REDUCTION using SEMANA The knowledge reduction algorithm reduces the 108 different descriptions of Polish Proximal Deictic Morphemes to 34 decision rules.

34 Morphological Rules r1 (9) : CASdat,NBRplu --> tym r2 (3) : CASins,GNDmas,NBRsin --> tym r3 (3) : CASins,GNDneu,NBRsin --> tym r4 (3) : CASloc,GNDmas,NBRsin --> tym r5 (3) : CASloc,GNDneu,NBRsin --> tym r6 (9) : CASins,NBRplu --> tymi r7 (1) : CASacc,ANYhum,GNDmas,NBRplu --> tych r8 (9) : CASgen,NBRplu --> tych r9 (9) : CASloc,NBRplu --> tych r10 (3) : CASacc,GNDneu,NBRsin --> to r11 (3) : CASnom,GNDneu,NBRsin --> to r12 (3) : CASacc,ANYina,NBRplu --> te r13 (3) : CASacc,ANYnhu,NBRplu --> te r14 (3) : CASacc,GNDfem,NBRplu --> te r15 (3) : CASacc,GNDneu,NBRplu --> te r16 (3) : CASnom,ANYina,NBRplu --> te r17 (3) : CASnom,ANYnhu,NBRplu --> te r18 (3) : CASnom,GNDfem,NBRplu --> te r19 (3) : CASnom,GNDneu,NBRplu --> te r20 (1) : CASacc,ANYina,GNDmas,NBRsin --> ten r21 (3) : CASnom,GNDmas,NBRsin --> ten r22 (3) : CASdat,GNDmas,NBRsin --> temu r23 (3) : CASdat,GNDneu,NBRsin --> temu r24 (3) : CASdat,GNDfem,NBRsin --> tej r25 (3) : CASgen,GNDfem,NBRsin --> tej r26 (3) : CASloc,GNDfem,NBRsin --> tej r27 (1) : CASacc,ANYhum,GNDmas,NBRsin --> tego r28 (1) : CASacc,ANYnhu,GNDmas,NBRsin --> tego r29 (3) : CASgen,GNDmas,NBRsin --> tego r30 (3) : CASgen,GNDneu,NBRsin --> tego r31 (3) : CASacc,GNDfem,NBRsin --> te* r32 (3) : CASnom,GNDfem,NBRsin --> ta r33 (3) : CASins,GNDfem,NBRsin --> ta* r34 (1) : CASnom,ANYhum,GNDmas,NBRplu --> ci

DISCOVERED KNOWLEDGE 1.All the 108 different descriptions can be represented by 34 rules only rules represent the singular forms and 14 rules represent the plural forms. 3.The Gender attribute is not necessary in 8 rules in plural and in cases other than Nominative. This confirms the generally observed fact that, in Polish grammar, in the plural oblique cases, gender is neutralized (no Gender distinction). 4. The Attribute “Animacy” is present in 9/34 rules and 17/108 samples. 3 rules contain the value Human ( hum ) r07 (1) : CASacc,ANYhum,GNDmas,NBRplu --> tych r27 (1) : CASacc,ANYhum,GNDmas,NBRsin --> tego r34 (1) : CASnom,ANYhum,GNDmas,NBRplu --> ci 3 rules contain the value Inanimate ( ina ) r20 (1) : CASacc,ANYina,GNDmas,NBRsin --> ten r12 (3) : CASacc,ANYina,NBRplu --> te r16 (3) : CASnom,ANYina,NBRplu --> te 3 rules contain the value non Human ( nhu ) r17 (3) : CASnom,ANYnhu,NBRplu --> te r13 (3) : CASacc,ANYnhu,NBRplu --> te r28 (1) : CASacc,ANYnhu,GNDmas,NBRsin --> tego

GENDER and ANIMACY The 7 genders theory proposed a too coarse-grained analysis of the domain using only one attribute supposed to represent the Gender category. In our “ first trial ”, in addition to Gender, two binary categories ( Human and Animate ) were introduced resulting, as a matter of fact, in a too fine-grained description of the domain. In our “ second trial ”, after having merged the two binary categories, we got one three-valued Animacy category. As a result, the Analyser (1) detects none of the following anomalies: duplicates (of usages, not uses), non attested pairs of values and (2) proposed no attribute merging possibilities. Needless to say that our theory takes into account the definition of Gender category such as it is generally used in grammars of other languages.

The ONTOLOGICAL STRUCTURE of ANIMACY Interestingly, we noticed that the Feature Structure of Animacy Attribute being a binary tree, it is normal that its values are all exclusive by the law of the excluded middle: nothing can be true and false at the same time. ANIMACY HUMANITY non animatenon human human -+ -+

RELATIVE WEIGHT OF THE ANIMACY ATTRIBUTE If we consider the relative weight of the ANIMACY attribute (only 5.4%), we can better understand the difficulties that Polish linguists encountered in their work. Relative weight of attributes N weight(%) 1.CAS NBR GND ANY It becomes clear that ANIMACY is not as important a category as the other three ones (Case, Number and Gender) which co-occur in the amalgamated adjective paradigm.

Correspondence Factor Analysis (CFA) Numbers in the Table are considered as coordinates of points in a N-dimensional space. z x y F1 F2 F3 CFA calculates the axes of inertia of the cloud of points (F1, F2, F3 … ) and displays projections in planes [F1,F2], [F1,F3], etc. CFA is implemented as “Stat-3” in “Semana” 

Proj. In plane [1,2] PROJECTION DANS LE PLAN FACTORIEL [1,2] | Horizontal: Axe #2 (Inertie: 12.81%) ——— Vertical: Axe #1 (Inertie: 13.05%) | Largeur: ; Hauteur: ; Nombre de points : tem | | | 00 | | tej | 00 | | | 00 | | dat | 00 | | | 00 | sin| | 00 | te* tego | | 10 ta to ten | | 00 | | | 00 | | tym ta* | 00 | | | inahum---gen | nhumas | 20 | nom acc fem| | 10 | neu| loc | 00 | | | 00 | | ins | 00 | | | 00 | plu | 00 | | | 00 | ci | tych | 10 | te | | 00 | | | 00 | | tymi| axis 2 axis Qualifiers = animacy, gender Quantifiers = number Syntactic relators = cases morphemes Qualifiers = animacy, gender Quantifiers = number Syntactic relators = cases morphemes Projection in plane [1,2]

PROJECTION DANS LE PLAN FACTORIEL [1,2] | Horizontal: Axe #2 (Inertie: 12.81%) ——— Vertical: Axe #1 (Inertie: 13.05%) | Largeur: ; Hauteur: ; Nombre de points : temu+--10 | | | 00 | tej | | 00 | | | 00 | | dat | 00 | | | 00 | sin | | 00 | te* tego | 00 | ta | to ten | 00 | | | 00 ta* | tym | 00 | | | gen------inahum | nhu| mas | 20 | fem acc| nom | 10 | loc | neu | 00 | | | 00 | ins| | 00 | | | 00 | |plu | 00 | | | 00 | tych ci | 10 | | te | 00 | | | 00 |tymi | | axis 4 axis 1 morphemes ta*, te*, tej, ta are only associated to feminine morphemes tego, to, ten, temu, ci are only associated to masculine or neutral morphemes ta*, te*, tej, ta are only associated to feminine morphemes tego, to, ten, temu, ci are only associated to masculine or neutral Again, tym is ambiguous and may be associated to any gender Axis 4 separates gender: vs Axis 4 separates gender: feminine vs {masculine, neutral} Note that animacy is still not differenciated on axis 4. Differenciation appears only on axis 9 ! Note that animacy is still not differenciated on axis 4. Differenciation appears only on axis 9 !

Differenciation of Animacy does not appear before factor 9 FREQ QLT INR | F#1 COR CTR | F#2 COR CTR | F#3 COR CTR | F#4 COR CTR | —————————————————————————————————————————————————————————— ——————————————————————— hum | | | | | ina | | | | | nhu | | | | | acc | | | | | dat | | | | | gen | | | | | ins | | | | | loc | | | | | nom | | | | fem | | | | | mas | | | | neu | | | | | plu | | | | | sin | | | | | ci | | | | | ta | | | | | ta* | | | | | te | | | | | te* | | | | | tego | | | | | tej | | | | | temu | | | | | ten | | | | | to | | | | | tych | | | | | tym | | | | | tymi | | | | | | F#5 COR CTR | F#6 COR CTR | F#7 COR CTR | F#8 COR CTR | F#9 COR CTR | —————————————————————————————————————————————————————————— —————————————————————————— hum | | | | | | ina | | | | | | nhu | | | | | | acc | | | | | | dat | | | | | | gen | | | | | | ins | | | | | | loc | | | | | | nom | | | | | | fem | | | | | | mas | | | | | | neu | | | | | | plu | | | | | | sin | | | | | | ci | | | | | | ta | | | | | | ta* | | | | | | te | | | | | | te* | | | | | | tego | | | | | | tej | | | | | | temu | | | | | | ten | | | | | | to | | | | | | tych | | | | | | tym | | | | | | tymi | | | | | | Animacy first appears on factor 9

PROJECTION DANS LE PLAN FACTORIEL [1,9] | Horizontal: Axe #1 (Inertie: 13.05%) ——— Vertical: Axe #9 (Inertie: %) | Largeur: ; Hauteur: ; Nombre de points : ci | | | 00 | | te* | 00 | | | 00 | hum| | 00 | | | 00 | | to | 00 | | | 00 | acc | ta* | tycplu---ins locfem+-----tym tegsindat tem+--40 tymi nomgen| ta tej | 02 | te nhu| | 00 | | | 00 | ina| | 00 | | | 00 | | ten | axis 1 axis 9 (inertia = 4.35 %) Axis 9 separates vs Axis 9 separates human vs {nonHuman, inanimate} morpheme ci applies only to human entities

Masculine is “unmarked” in Polish utterances Matka i dziecko nie mogli się sobą nacieszyć. (The mother and her child were gazing at each other.) Obviously, none of the statements below can be true: - *GENDER is a subcategory of ANIMACY - *ANIMACY is a subcategory of GENDER On the contrary, it is easy to admit that: HUMAN is a subcategory of ANIMACY. We claim that Attributes with “heterogeneous” values do not exist. Consequently, the presumed “syncretism” of GENDER and ANIMACY is meaningless. NOUN, fem, hum “ mother ” NOUN, neu, hum “ child ” VERB, mas, hum “ can ”

Comparing Theories of Polish Gender THEORIES GRAMMATICALIZED ATTRIBUTES Mańczak W. (5) Saloni Z. (7) Woliński M. (8) 2003 Przepiórkowski A. (9) GENDER feminine neuter non animal masculine animal masculine personal masculine non personal masculine “pluralia tantum” This proposal (3 values of GENDER) (3 values of ANIMACY) GENDERANIMACY feminine neuter masculine inanimate non human animate human animate