1 The challenge of non-discreteness: Focal structure in language Stockholm, August 31, 2012 Andrej A. Kibrik (Institute of Linguistics RAN and Lomonosov.

1 1 The challenge of non-discreteness: Focal structure in language Stockholm, August 31, 2012 Andrej A. Kibrik (Institute of Linguistics RAN and Lomonosov Moscow State University)

2 2 The problem We tend to think about language as a system of discrete, segmental units (phonemes, morphemes, words, sentences...) But this view does not survive an encounter with reality

3 3 Simple example: morpheme fusion Russian adjective детский childrens, childish det-sk-ij child-Attr-M.Nom Root-Suffix-Ending suffix [ deck -ij ] root Many human languages have something like that in morphological structure

4 4 Similar phenomena abound at all lingustic levels Phonemes Syllables Words Clauses Sentences

5 5 Phonemes Coarticulation: catkeep cool Engwall (2000): articulographic study of how pronunciation of Swedish fricatives is affected by surrounding vowels Sequences such as asa, ɪsɪ, ɔsɔ, ʊsʊ, aɕa, ɔʂɔ, ʊfʊ, etc. For example: context of labial vowels strongly increases lip protrusion

6 6 Phonemes (continued) Also, the tongue is more anterior in the context of the front vowel / ɪ / compared to back vowels (Engwall 2000: 10) That is, boundaries between segments are not really segmental Trying to posit boundaries in the signal inevitably means a kind of digitalization

7 7 Syllables Language speakers often naturally feel the syllabic structure But segmentation into syllables is usually less than clear-cut For example, speakers of Pulaar confidently segment words into syllables, e.g. gor |ko man But cf. the behavior of geminated consonants On the one hand, when asked to segment a word into syllables, speakers of Pulaar usually posit a boundary between the two copies of a geminated consonant: hok |kam give me On the other hand, a Pulaar secret language is reported the encrypting sequence lfV is inserted after the first syllable of a word (Gaden 1914, Labouret 1952: 108): hokkamndiyam water holfokkamndilfiyam Geminates such as kk are thus inconsistent: in some way they belong to two different syllables in some other way they form the onset of a syllable (Koval 2000: 114, 185)

8 8 Words Possessive constructions N + N English is often said to have two kinds of genitives: synthetic s-genitive: the queens retinue analytic of-genitive: the retinue of the queen On the one hand, of is a preposition and thus clearly belongs to the possessor rather than to the possessed the retinue [of the queen], lots [of stuff] On the other hand, there are indications of reanalysis Jurafsky et al. 1998: of is so often reduced that one must posit the allomorph [ ɔ ] Native users of English feel that and render that in spelling, also altering the affiliation of the clitic lots of > lotsa, couple of > coupla Kinda outta luck (song by Lana del Rey) This kind of graphic practices suggest that language users attach the clitic of to the possessed rather than to the possessor In terms of Nichols 1986, in these kinds of examples English hesitates on behaving as dependent-marking or head-marking Of displays doubleface behavior in two ways as any clitic, it is a semi-word, that is something between a word and an affix it oscillates between two possible hosts

9 9 Clauses Widely held view of syntax from the discourse perspective (see Chafe 1994): Local discourse structure consists of quanta, or chunks, or elementary discourse units (EDUs) (Kibrik and Podlesskaya eds. 2009) EDUs can be defined by a set of prosodic criteria Thus identified EDUs typically coincide with clauses The level of such coincidence mostly varies within the range between 1/2 and 3/4

10 10 Clauses (continued) Language Percentage of clausal EDUs English (Chafe 1994)60% Mandarin (Iwasaki and Tao 1993)39.8% Sasak (Wouk 2008)51.7% Japanese (Matsumoto 2000)68% Russian (Kibrik and Podlesskaya eds. 2009) 67.7% Upper Kuskokwim (Kibrik 2012)70.8%

11 11 Clauses (continued) However, there is a significant residue Non-clausal EDUs Subclausal EDUs Increments (translation from a Russian spoken corpus Night Dream Stories – Kibrik and Podlesskaya eds. 2009) And suddenly I saw a box. With a ribbon on top. Increments appear after a clear prosodic boundary At the same time, they semantically and grammatically fit into the preceding base clause Such increments simultaneously belong and do not belong to the preceding clause They are outliers in clause structure

12 12 Paradigmatics So far we have only discussed difficulties associated with the syntagmatic indentification of units The same problem applies to paradigmatic boundaries That is, boundaries between classes, types, or categories in an inventory Marginal phonemes One might consider the voiceless velar fricative /x/ occurring in words such as Bach (the German composer) or loch (a Scottish lake) as a marginal phoneme for some speakers of English (Brinton and Brinton 2010: 53) Russian [w] in loan words Russian has phonemes /v/ and /u/ English William > RussianВильям or Уильям ViljamUiljam [v][u], recently [w] English wow > Russian: usually spelled вау vau, pronounced [wau]

13 13 Semantics Semantics provides particularly abundant evidence of non- discrete boundaries Plethora of examples have been discussed in cognitive semantics Textbook example from Labovs 1973 Boundaries of words and their meanings cupbowl

14 14 Diachronic change Diachrony provides innumerable examples of non- discrete boundaries between linguistic elements or stages Hock and Joseph 1996: 237-238 Old English wēod plant and wæ̅d(e) garment Both developed into modern English weed The meaning garment only survives in a couple of expressions, such as widows weed a widows mourning clothes Modern speakers tend to connect this usage with the winning weed The erstwhile meaning of wæ̅d(e) is echoed in the modern language as a faint trace

15 15 Language wholeness Languages are identifiable, but every language has internal variation Consider a very small language, Upper Kuskokwim Athabaskan Ethnic group of about 200 individuals in central interior Alaska About 20 remaining speakers The members of the group have a clear feeling of identity, as well as separateness from other neighboring Athabaskan languages Still, striking dialectal variation In particular, the rendering of Proto-Athabaskan coronal consonant series © Michael Krauss, 2011

16 16 Language wholeness (continued) InterdentalDentalRetroflexAs in: Dialect:my tonguesnowraven Conservative: no merger sitsulatsetł'dotron'Tanana Standard merger: loss of interdentals sitsulatsetł'dotron'Tsetsaut Downriver merger: loss of retroflex sitsulatsetł'dotson'Koyukon Merger of all threesitsulatsetł'dotson'Ahtna

17 17 Language wholeness (continued) Note that the rendering of coronal series is traditionally used as the basis for classifying the family into branches This situation can be explained by geographical and demographic factors The Upper Kuskokwim traditional territory probably occupied over 50 K square kilometers Traditionally, contact between famlies/bands was seasonal or sporadic Still, what identifies the languages wholeness and boundaries in terms of internal characteristics?

18 18 Proto-languages Linguists often speak about proto-languages (Proto- Germanic, Proto-IE, etc.), as if they were fixed, 100% homogeneous communities without any internal variation Dahl (2001) discussed the status of Old Nordic He questions the notion of Common Nordic and the assumption that the Scandinavians changed their language all at the same time and in the same fashion, as if conforming to a EU regulation on the length of cucumbers (p. 227). Contrary to the traditional tree-like picture of a proto- language splitting into daughter languages, Dahl suggests that the spread of prestige dialects may have led to a decrease in diversity and to unification

19 19 Language contact Trudgill 2011: 56-58 Contact with Low German affected Scandinavian languages significantly This influence can generally be described as simplification That was possible because in the 1400s cities such as Bergen and Stockholm had about 1/3 or more of German population When non-native population reaches close to 50%, natives accommodate Boundaries between languages are thus penetrable

20 20 Other cognitive domains Studies by the Russian psychologist Yuri Alexandrov Alexandrov and Sergienko 2003: psychophysiological experiments demonstrate the non-disjunctive character of mind and behavior Continuity is the overarching principle in the organization of living things at various levels (p. 105) Alexandrov and Alexandrova 2010: complementary, non-disjunctive character of cultures Niels Bohr, discussing the relationships between cultures, emphasized that, unlike physics there is no mutual exclusion of properties belonging to different cultures.

21 21 Intermediate conclusion Language (as well as cognition in general) simultaneously longs for discrete, segmented structure tries to avoid it The omnipresence of non-discreteness effects has not yet led to proper recognition in the mainstream linguistic thinking Linguists are often bashful about non-discreteness But non-discreteness is not just a nuisance Non-discrete effects permeate every single aspect of language This problem is in the core of theoretical debates about language

22 22 Possible reactions Digital linguistics: More inclusive (analog) linguistics: often a mere statement of continuous boundaries and countless intermediate/borderline cases ignore non-discrete phenomena or dismiss them as minor Ferdinand de Saussure: language only consists of identities and differences the discreteness delusion a bit too simplistic appeal of scientific rigor but reductionism

23 23 Cognitive science Wittgenstein: family resemblance Rosch: prototype theory Lakoff: radial categories A B C D A is the prototypical phoneme/word/clause/meaning... B, C, and D are less prototypical representatives We still need a theory for: boundaries between related categories boundaries in the syntagmatic structure Picture from Janda and Nesset 2012

24 24 My main suggestion In the case of language we see the structure that combines the properties of discrete and non-discrete: focal structure Focal phenomena are simultaneously distinct and related Focal structure is a special kind of structure found in linguistic phenomena, alternative to the discrete structure It is the hallmark of linguistic and, possibly, cognitive phenomena, in constrast to simpler kinds of matter

25 25 Various kinds of structures focal point 1 focal point 2 discrete structure continuous structure focal structure 1 2 12 or anchor point outlier hybrid

26 26 A possible analogy: neuronal structure with synapses

27 27 Examples focal point 1 focal point 2 det [c]sk v wu wēod(widows) weedwæ̅d(e) Old NorseNorwegianLow German Syntagm. Paradigm. Diachr. etc., etc.

28 28 Caveat The claim about non-discrete boundaries should not be overstated Phonemes, words, clauses, and languages do exist They are just not as discrete and segmental as we apparently want them to be We should not replace the discrete structure with the idea of a mere continuum, basically non-structure Cf. Goddard 2010: 233 defending the discrete character of meaning by dismissing the idea of a continuum or merging Something like focal structure is in order as the major model of linguistic and cognitive matter

29 29 Peripheral status of non-discrete phenomena in linguistics Are linguists unaware about the non- discreteness effects? No, they are aware of them distinct but related But they tend to ignore them Why? I am not sure But I suspect the answer is related to the well known Kants problem

30 30 Kants puzzle The Critique of Pure Reason: The role of observer, or cognizer, crucially affects the knowledge of the world The schematicism by which our understanding deals with the phenomenal world... is a skill so deeply hidden in the human soul that we shall hardly guess the secret trick that Nature here employs. It is possible that the human analytical mind is digital, and it wants its object of observation to be digital as well In addition, standards of scientific thought have developed on the basis of physical, rather than cognitive, reality Physical reality is much more prone to the discrete approach Compared to physical world, in the case of language and other cognitive processes Kants problem is much more acute because mind here functions both as an observer and an object of observation, so making the distinction between the two is difficult

31 31 A paradoxical state of affairs Language is full of non-discrete phenomena But our digital mind is biased towards discreteness Perhaps, partly because of the scientific tradition based on segmentation and categorization (Aristotelian, rational, left- hemispheric, etc.) It is like eyeglasses keeping only a part of the reality and filtering out the rest Addressing the analog reality in its entirety is often perceived as pseudo-science, or quasi-science at best Language is unknowable, a Ding an sich?

32 32 What to do? We need to develop a more embracing linguistics and cognitive science that address non-discrete phenomena: not as exceptions or periphery of language and cognition but rather as their core Can we outwit our mind? Two suggestions towards this goal 1.Object of investigation: concentrate on obviously non-discrete communication channels, not so burdened with the tradition of discrete analysis 2.Methodology: new type of models

33 33 SUGGESTION 1: Look at communication channels other than verbal Explore gesticulation accompanying speech Michael Tomasello (2009): in order to understand how humans communicate with one another using a language we must first understand how humans communicate with one another using natural gestures I discuss a case study in Reference of discourse (2011) Explore prosody Sandro Kodzasov (2011): there is a multitude of prosodic techniques defining the basic gestalts of our perception of the world These communication channels are obviously less discrete than the verbal code So it may be a good idea to develop new theoretical approaches on the basis of gesticulation and prosody, then apply them to traditional, segmental language

34 34 Sentences In written language, sentences are separated from each other by dedicated punctuation marks Is the notion of sentence applicable to spoken language? cf. the written language bias (Linell 2005) written language, inherently digital, hypnotizes people and makes them think that language is generally discrete Is sentence viable? (Kibrik 2008) In brief, spoken Russian displays two major prosodic patterns: comma intonation: rising on the main accent of EDU period intonation: final falling on the main accent of EDU But also falling comma intonation – non-final falling: similar to comma intonation in terms of discourse semantics formally similar to period intonation /, \. \,

35 35 What to do? It appears that non-final falling is not as low as final falling But the difference cannot be identified in absolute terms Great variation (gender, individual) What is final falling in one person can be non-final in another Employ the speakers prosodic portrait Final falling, targets at the bottom of the given speakers F0 range Non-final falling targets at a level several dozen Hz (several semitones) higher than the final falling in the given speaker

36 36 F0 graph for an example \ozero, \malenkoe \nebol \brevno kakoe \mosta. takoe, šoe.-to, 12 10 12 5 8 There was a lake, / either a river, / or a lake, / but I guess a lake, \ because somehow it was small, \ not a big one. \ And across it there was a log, \ like a bridge. \

37 37 Representation of EDU continuity types (or phase types) in corpus

38 38 Sentences (continued) There are clearly contrasted, focal patterns: final falling (end) rising (non-end) Speakers and listeners usually know when a sentence is completed and when it is not Spoken sentences are the prototype of written sentences In addition, the hybrid type must be recognized: non-final falling It can be identified on the basis of speakers prosodic portraits This helps to deal with tremendous phonetic variation With this analysis, the notion of spoken sentence remains viable

39 39 SUGGESTION 2: Entertain another type of models Methodological point 1960s: a fashion of mathematical methods in linguistics That did not bring much fruit, primarily because of the non-discreteness effects Time for another attempt of bringing in more useful kinds of mathematics

40 40 Ongoing project: Modeling referential choice in discourse When we mention a person/object, we choose from a set of options proper name: Kant description: the philosopher reduced form: he Corpus of Wall Street Journal texts words – 45016, EDUs – 5497, anaphors – 3994 Annotation for multiple variables, candidate factors of ref. choice distances to antecedent antecedents syntactic role protagonisthood animacy.............. Machine learning algorithms logical logistic regression compositions Two-way task: Full NP vs. pronoun Three-way task: proper name vs. description vs. pronoun

41 41 Results of machine learning modeling

42 42 Non-categorical referential choice 100% accuracy cannot be reached The choice is not always deterministic: often only one option is appropriate sometimes both Kant and he are appropriate Experiment (Mariya Khudyakova) Nine texts in which the algorithms deviated in their prediction compared to the original referential choice: pronoun instead of a proper name Each text was presented to 60 experiment participants, in one of the two variations: original (proper name) and altered (pronoun) Questions testing the understanding of the referent in question

43 43 Non-categorical referential choice (continued) In seven texts out of nine, accuracy of answers to pronouns was the same as in answers to proper names In these instances the algorithm correctly predicted a pronoun, even though deviating from the original referential choice In two instances participants showed a significant drop in their accuracy In these instances the algorithms erred in their prediction Logistic regression provides the degree of certainty in prediction that can be, with due caution, interpreted as probability In one more instance the algorithm showed too high certainty of prediction (0.89) which must not be the case given that the original choice was different We are working on the improvement of the method (Kibrik et al. ms. 2012)

44 44 New type of models (continued) Non-categorical referential choice: a hybrid between the clear, focal instances Probabilistic modeling and machine learning techniques can be used to simulate human behavior in non-categorical situations We need to employ mathematical methods appropriate for the cognitive matter

45 45 Conclusion Just as we invoke scientific thinking, we tend to immediately turn to discrete analysis This may the reason why discrete linguistics is so popular, in spite of the omnipresence and obviousness of non-discrete effects This may be our inherent bias, or a habit developed in natural sciences, or a cultural preference But in the case of language and other cognitive processes we do see the limits of the traditional discrete approach It remains an open question if linguists and cognitive scientists are able to eventually overcome the strong bias towards pure reason and discrete analysis, or language will remain a Ding an sich But it is worth trying to circumvent this bias and to seriously explore the focal, non-discrete structure that is in the very core of language and cognition

46 46 Thanks for your attention CONGENIAL QUOTATIONS Unfortunately, or luckily, no language is tyrannically consistent. All grammars leak. (Sapir 1921: 38) Words as well as the world itself display the orderly heterogeneity which characterizes language as a whole (Labov 1973: 30) The mind-brain is both modular and interconnected To insist on one to the exclustion of the other is to short-change the enormous complexity of this quintessentially hybrid system (Givón 1999: 107-108)

47 47 References Alexandrov, Yuri I., and Natalia L. Alexandrova. 2010. Komplementarnost kultur. In: M.A.Kozlova (ed.) Ot sobytija k bytiju. M: Izd. dom VShE, 298-335. Alexandrov, Yuri I., and Elena A. Sergienko. 2003. Psixologicheskoe i fiziologicheskoe: kontinualnost i/ili diskretnost? Psixologicheskij zhurnal 24.6, 98- 109. Brinton, Laurel J., and Donna Brinton. 2010. The linguistic structure of modern English. Amsterdam: Benjamins. Chafe, W. 1994. Discourse, consciousness, and time. Chicago: University of Chicago Press. Dahl, Östen. The origin of the Scandinavian languages. 2001. In: Dahl, Östen, and Maria Koptjevskaja-Tamm (eds.) The Circum-Baltic languages. Typology and contact. Vol. 1. Amsterdam: Benjamins, 215-236. Engwall, Olov. 2000. Dynamical aspects of coarticulation in Swedish fricatives – a combined EMA & EPG study. TMH-QPSR 4/2000. Givon, T. 1999. Generativity and variation: The notion Rule of grammar revisited. In: B.MacWhinney (ed.) The emergence of language. Mahwah: Erlbaum, 81-114. Goddard, Cliff. 2011. Semantic analysis: A practical introduction. Oxford: OUP. Hoch, Henrich, and Brian Joseph. 1996. Language history, language change, and language relationship. Berlin: Mouton de Gruyter. Iwasaki S., Tao H.-Y. 1993. A comparative study of the structure of the intonation unit in English, Japanese, and Mandarin Chinese. Paper presented at the annual meeting of LSA.

48 48 References (continued) Jurafsky, Daniel, Alan Bell, Eric Fosler-Lussiery, Cynthia Girand, and William Raymond. 1998. Reduction of English functionwords in switchboard. In Proceedings of ICSLP-98, Sydney Kibrik, A.A. 2008a. Est li predlozhenie v ustnoj rechi? // A.V.Arxipov et al. eds. Fonetika i nefonetika. M.: JaSK, 104115. Kibrik, A.A. Reference in discourse. Oxford, 2011. Kibrik, A.A. Prosody and local discourse structure in a polysynthetic language. 2012 Kibrik A. A., Podlesskaya V. I. (eds.) 2009. Rasskazy o snovidenijax: Korpusnoe issledovanie ustnogo russkogo diskursa [Night Dream Stories: A corpus study of spoken Russian discourse]. Moscow: JaSK. Koval A.I. Morfemika Pulaar-Fulfulde [Formal morphology of Pulaar-Fulfulde] // V.A.Vinogradov ed. Osnovy afrikanskogo jazykoznanija. Morfemika. Moscow: Vost. literatura, 2000, 103 - 290 Labouret, Henri. 1952. La langue des Peuls ou Foulbé. Dakar : IFAN. Labov, William. 1973. The boundaries of words and their meanings. In: R. Fasold (ed.) Variation in the form and use of language. Georgetown University Press, 29-62. Linell, P. 1982. The written language bias in linguistics. Linköping, Sweden: University of Linköping. Matsumoto K. 2000. Japanese intonation units and syntactic structure. Studies in Language 24: 525-564. Trudgill, Peter. 2011. Sociolinguistic typology: Social determinants of linguistic complexity. Oxford: Oxford University Press. Wouk F. 2008. The syntax of intonation units in Sasak. Studies in Language 32: 137–162.

49 49 Acknowledgements Yuri Alexandrov Mira Bergelson Svetlana Burlak Olga Fedorova Vera Podlesskaya Natalia Slioussar Valery Solovyev

