Presentation is loading. Please wait.

Presentation is loading. Please wait.

1.Linguistics - State of the ArtLinguistics - State of the Art 2.What is Interactive Modelling ?What is Interactive Modelling ? 3.Knowledge Discovery in.

Similar presentations


Presentation on theme: "1.Linguistics - State of the ArtLinguistics - State of the Art 2.What is Interactive Modelling ?What is Interactive Modelling ? 3.Knowledge Discovery in."— Presentation transcript:

1 1.Linguistics - State of the ArtLinguistics - State of the Art 2.What is Interactive Modelling ?What is Interactive Modelling ? 3.Knowledge Discovery in Databases (KDD)Knowledge Discovery in Databases (KDD) 4.Interactive LinguisticsInteractive Linguistics 5.The SEMANA softwareThe SEMANA software 6. Data Representation Data Representation 7.Attributive & Relational KnowledgeAttributive & Relational Knowledge 8. Some Results of Formal Concept Analysis (FCA) Some Results of Formal Concept Analysis (FCA) MAIN TOPICS

2 LINGUISTICS (State of the Art)

3 “Form” and “Matter” in Structural Linguistics OBJECTAPPROACHTHEORYFORM(Structures)typesuniversalhomogeneousDeductive Synthesis using rules L = (W, G) generated Language is a set of sentences generated by grammar rules G from words W Prediction MATTER(Data)instancesspecificheterogeneousInductive Analysis of analogies L = (W, L) analysed Language is a set of sentences L analysed as words W Explanation ALTMAN G. (1987) "The Levels of Linguistic Investigation", Theoretical Linguistics, vol. 14, edited by H. Schnelle, W. de Guyter, Berlin - New York

4 Structural and Computational Linguistics Structural Linguistics Computational Linguistics FORM(Structures) THEORY-oriented Linguistics (Formal Generative Linguistics) Natural Language Processing (Lexicon-Functional Grammars, Unification Grammars, Logic Grammars) MATTER(Data) DATA-oriented Linguistics (Linguistic Typology) Human Language Technology (Corpus Linguistics, Lexicons and Thesauri - WordNet, FrameNet etc.)

5 INTERACTIVE LINGUISTICS André WLODARCZYK

6 What is Interactive Modelling ?

7 Standard Research Procedure Theory Domain of interest Meta-Theory Researcher

8 Meta-Theory (formal) Theory (formal) Abstraction (Describe raw data) Meta-Abstraction (Extend the meta-theory) Formalization stage I (Interpret the Meta-Theory) Formalization stage II (Interpret the Theory) Verification (Evaluate the Model/Theory) DOMAIN of Interest Modeling Tasks in a loop Metaphor Model (formal) Simplification (Compress the Model)

9 KNOWLEDGE DISCOVERY IN DATABASES (KDD)

10 Characteristics of KDD 1.Tasks (visualization, classification, clustering, regression etc.) 2.Structure of the Model adapted to data (it determines the limits of what will be compared or revealed) 3.Evaluation function (adequacy / correspondence and generalization problems) 4.Search or Optimalization Methods (heart of data exploration algorithms) 5.Data Management Techniques (tools for data accumulation and indexation). HAND David, MANNILA Heikki & SMYTH Padhraic (2001) Principles of Data Mining, Massachussetts Institute of Technology, USA

11 KDD Procedure From Data Mining to Knowledge Discovery in Databases by Usama Fayyad, Gregory Piatetsky- Shapiro, and Padhraic Smyth, AI Magazine 1997 (American Association for Artificial Intelligence)

12 INTERACTIVE LINGUISTICS

13 In language studies this method is issued from the integration of Text Mining and Analysis (Data Mining). A utomated D iscovery S ystems A utomated D iscovery S ystems D ata B ase M anagement S ystems D ata B ase M anagement S ystems KDD in the Linguistic Domain Corpus Linguistics Interactive Linguistics NLP Natural Language Processing NLP HLT Human Language Technologies HLT

14 Text Mining and Data Mining From Data Mining to Knowledge Discovery in Databases by Usama Fayyad, Gregory Piatetsky- Shapiro, and Padhraic Smyth, AI Magazine 1997 (American Association for Artificial Intelligence) Text Mining Data Mining 1. This task needs active involvement on behalf of the researcher. 2. This task is automatic. 3. Interactive tasks with KDD algorithms (Rough Set, FCA, etc.)

15 OBJECTS – APPROACHES - TASKS Object s Text Data Symbolic Data Approaches Corpus Linguistics Text Document Exploration (Text Mining) Interactive Linguistics Linguistic Knowledge Extraction (Data Mining (Data Mining) Tasks 1. Selection 2a. Preprocessing 2b. Filtering 3. Transformation 4. Analysis 5. Evaluation

16 The SEMANA software

17 Application for Apple, Windows-PC and Linux computers Application for Apple, Windows-PC and Linux computers Charts (various formats) “Multi-valued tables” “One-valued tables” Rough Set Theory Decision Logic Upper approximation, Lower approximation, Reducts, Core, Discriminating power Minimal rules, Attribute strength (Pawlak, Skowron & Polkwski) & Polkwski) (Bolc, Cytowski & Stacewicz) Formal Concept Analysis Statistical tools Galois lattice, “central concepts” Correlation Matrix, Correspondence Factor, Analysis, Hierarchical, Classifications (Wille, Ganter) (Benzécri) Dynamic DB Builder Data sheets Data coding Data storage Attribute Editor Discretisation Logical scaling … Tree Builder Aid to code structuring Architecture of SEMANA

18 Logical Anlyses in SEMANA Symbolic Data Analysis InterpretationEvaluation Set Theory & Lattice Theory Decision Rules Formal Contexts Lattices Formal Concept Analysis Rudolf WILLE, 1982 Logics Rough Set Theory Zdzisław PAWLAK, 1981 Proposition Logic Various Other Tools for Data Analysis & Data Processing : Approximation, Analogy, Discrimination Power, Imputation etc. Statistical Data Analysis 1. Factor Analysis 2. Bottom-up Hierarchical classification RST RFCA FCA STAT 3 RSA DL Decision Logic

19 Data Representation

20 ObjectsAttributes :=:= *) Context by Wille R. – 1982 (Formal Concept Analysis)  “Single-valued charts” **) Information System by Pawlak Z. – 1982 (Rough Set Theory)  “Multi-valued charts” Assignment of Attributes to Objects

21 Objects and Attributes SYSTEM NAME OBJECTATTRIBUTERELATIONAuthor Data Base ArgumentPredicateRelation Codd, 1969 Chu Space Point/Indi vidual State Assignment ( := ) Barr & Chu, 1979 ElementPredicate Dual Relationship Pratt ContextExtentIntent Assignment ( := ) Wille, 1982 Information System ObjectAttribute Assignment ( := ) Pawlak, 1982 General System ObjectSortSignature Pogonowski, 1982 ClassificationTokenType Satisfaction ( ⊫ ) Barwise & Seligmann, 1997 InstitutionSentenceModel Satisfaction ( ⊫ ) Goguen, 2004

22 distinction 2similarity 2 distinction 1 similarity1 Following the definitions of ‘Similarity’ and ‘Distinction’ by Jerzy Pogonowski (1991), Linguistic Oppositions, UAM Scientific Editions, Poznań, pp. 125 ALL features are common NO features are common SOME features are common SOME features are NOT common Similarity & Distinction

23 Identity Difference Similarit y stron g weak stron g Distinction Identity and Difference

24 Linguistic Opposition Comparison of signs can be analysed in two dual continua which have identity (equivalence) and difference as their extreme cases.SimilarityDistinction Close Senses strongweak Distant Senses weakstrong Oppositions : Morphemes are organised by pairs of similarity and distinction. Structural linguists proposed 3 kinds of oppositions: privative (binary), equipollent (multi-value) and gradual (degree). Oppositions : Morphemes are organised by pairs of similarity and distinction. Structural linguists proposed 3 kinds of oppositions: privative (binary), equipollent (multi-value) and gradual (degree).

25 Sign Semion Sense Usage The Sign is a structure with usages U as objects and descriptions F as formulae. Sign = Sign = Let F be a set of atomic formulae F={ ϕ  χ, ψ…}and let  be a subset of formulae in F (i.e.:  ⊆  F). Let X={x, y, z...} be a subset of U (i.e.: X ⊆ U). We define a Semion as a formal concept in the FCA (Formal Concept Analysis) - a substructure (pair of sets: a set of uses X and a set of formulae  ) Semion = where X ⊆ U and  ⊆  F The Usage of a semion is defined as its Extent (extension); i.e.: a typified set (class) of uses. ||  || Sign = {x ∈  X : x ⊨ Sign  } The Sense of a semion is defined as its Intent (intension). ||X|| Sign = { ϕ ∈  ϕ ⊨ Sign X}

26 Attributive & Relational Knowledge

27 Semantic Network Collins & Quilian (1969) Collins, A. M., & Quillian, M. R. (1969). Retrieval Time from Semantic Memory. Journal of Verbal Learning and Verbal Behavior, 8,

28 Connectionist (neural) System Rumelhart & Todd (1993)

29 Some kind of Attributive Knowledge is similar to the Connectionnist one Two Conditions: 1. The initial System must contain Alternatives Attributes (multi- valued attributes) 2. The initial System must be complete (i.e. all cases present)

30 Relational Knowledge Example: Ontology Coordination using Information Flow Logic Marco Schorlemmer and Yannis Kalfoglou, « A Channel-Theoretic Foundation for Ontology Coordination »

31 Attributive Knowledge ‘Ohio’ is big then it is said to be a “river” in English. ‘Ohio’ is tributary then it is said to be a “rivière” in French.

32 Some Results of FCA (Formal Concept Analysis)

33 fcaBigEarsBlueEyesFlatNoseRoundFaceBald Jimxx0xx John0x00x Bobx0xxx Max00xx0 “Family Resemblance” Lattice of Formal Concepts LIST OF FORMAL CONCEPTS C1 {},{BigEars,BlueEyes,FlatNose,RoundFace,Bald} C2 {Bob},{BigEars,FlatNose,RoundFace,Bald} C3 {Bob,Max},{FlatNose,RoundFace} C4 {Jim},{BigEars,BlueEyes,RoundFace,Bald} C5 {Jim,Bob},{BigEars,RoundFace,Bald} C6 {Jim,Bob,Max},{RoundFace} C7 {Jim,John},{BlueEyes,Bald} C8 {Jim,John,Bob},{Bald} C9 {Jim,John,Bob,Max},{} **************************************** 2 intensional master-concept(s) : C2 {Bob},{BigEars,FlatNose,RoundFace,Bald} C4 {Jim},{BigEars,BlueEyes,RoundFace,Bald} **************************************** 2 extensional master-concept(s) : C6 {Jim,Bob,Max},{RoundFace} C8 {Jim,John,Bob},{Bald} **************************************** CENTRAL FORMAL CONCEPT : C5 {Jim,Bob},{BigEars,RoundFace,Bald} ****************************************

34 fcaBigEarsBlueEyesFlatNoseRoundFaceBald Jimxx0xx John0x00x Bobx0xxx Max00xx0 “Family Resemblance” Multi-base Classes CLASS STRUCTURE Classe 1: Bob C2 {Bob},{BigEars,FlatNose,RoundFace,Bald} C3 {Bob,Max},{FlatNose,RoundFace} Classe 2: Jim C4 {Jim},{BigEars,BlueEyes,RoundFace,Bald} C7 {Jim,John},{BlueEyes,Bald} Classe 3: Jim,Bob C5 {Jim,Bob},{BigEars,RoundFace,Bald} C6 {Jim,Bob,Max},{RoundFace} C8 {Jim,John,Bob},{Bald} All Formal Concepts included You have selected the formal concepts: C2 {Bob},{BigEars,FlatNose,RoundFace,Bald} C4 {Jim},{BigEars,BlueEyes,RoundFace,Bald} The meet is: C1 {},{BigEars,BlueEyes,FlatNose,RoundFace,Bald} The join is: C5 {Jim,Bob},{BigEars,RoundFace,Bald} Similarity index = 0.75 according to Zhao, Wang & Halang, 2006 (after Tversky's model) You have selected the formal concepts: C2 {Bob},{BigEars,FlatNose,RoundFace,Bald} C4 {Jim},{BigEars,BlueEyes,RoundFace,Bald} The meet is: C1 {},{BigEars,BlueEyes,FlatNose,RoundFace,Bald} The join is: C5 {Jim,Bob},{BigEars,RoundFace,Bald} Similarity index = 0.75 according to Zhao, Wang & Halang, 2006 (after Tversky's model)

35 Double Binary Opposition +FEM +HUM -FEM -HUM ły1 li2 ły2li1 Psy stały. Pociągi stały. Dzieci stały. Ludzie stali. Matka i dziecko stali. Panowie stali. Panie stały.

36  : GA ---> WA  : WA ---> GA - New In a BASE UTTERANCE : - “GA” (ga1) is a marker of the Attention-driven Phrase (Subject with the status: ‘New’ ) -“WA” (wa2) is a marker of the Attention-driven Phrase (Subject with status ‘non-New’) ga1 wa2 New Old - Old In an EXTENDED UTTERANCE : - “WA” (wa1) is a marker of the Attention- driven Phrase (Topic with the status: ‘Old’ ) -“GA” (ga2) is a marker of the Attention- driven Phrase (Focus with the status: ‘New’ ) ga2 wa1 Converse Opposition (my old name : “Boomerang Opposition”)

37  : GA WA  : WA GA WA GA ga1 WA wa1 ga2 wa2 GA Infomorphism of the Japanese ‘wa’ and ‘ga’ particles WA ⇄ GA infomorphism OLD + wa + NEW OLD +wa + OLD NEW + ga + OLD NEW + ga + NEW

38 The MIC Theory Conceptual Lattice View

39 © André WLODARCZYK


Download ppt "1.Linguistics - State of the ArtLinguistics - State of the Art 2.What is Interactive Modelling ?What is Interactive Modelling ? 3.Knowledge Discovery in."

Similar presentations


Ads by Google