Presentation is loading. Please wait.

Presentation is loading. Please wait.

Redundancy and reduction: Speakers manage syntactic information density Torsten Jachmann 16.05.2014 T. Florian Jaeger (2010) Seminar „Information Theoretic.

Similar presentations


Presentation on theme: "Redundancy and reduction: Speakers manage syntactic information density Torsten Jachmann 16.05.2014 T. Florian Jaeger (2010) Seminar „Information Theoretic."— Presentation transcript:

1 Redundancy and reduction: Speakers manage syntactic information density Torsten Jachmann 16.05.2014 T. Florian Jaeger (2010) Seminar „Information Theoretic Approaches to the Study of Language “

2 So far Frequent words have shorter linguistic forms (Zipf) o Orthographic; PHONOLOGICAL Word length (phonemes/syllables) correlated with predictability Information is context dependent o The more probable, the more redundant More predictable instances of the same word are produced shorter and with less phonological and phonetic detail

3 Idea Speakers manage the amount of information per amount of linguistic signal (at choice points) Morphosyntactic: o E.g. auxiliary contractions “he is” vs. “he’s”

4 Idea Speakers manage the amount of information per amount of linguistic signal (at choice points) Syntactic: o E.g. optional that-mentioning “This is the friend I told you about” vs. “This is the friend that I told you about”

5 Idea Speakers manage the amount of information per amount of linguistic signal (at choice points) Elide constituents: o E.g. optional argument and adjunct omission “I already ate” vs. “I already ate dinner”

6 Idea Speakers manage the amount of information per amount of linguistic signal (at choice points) Production planning: o E.g. one or more clauses “Next move the triangle over there” vs. “Next take the triangle and move it over there”

7 Idea Speakers manage the amount of information per amount of linguistic signal (at choice points) Other languages? o German: “Er hat es verstanden” vs. “Er hat’s verstanden” (he understood it) o Japanese: “ 行ってはダメ ” vs. “ 行っちゃダメ ” Itte ha dame Iccha dame (you can’t go)

8 Idea Speakers manage the amount of information per amount of linguistic signal (at choice points) The form with less linguistic signal should be less preferred whenever the reducible unit encodes a lot of information (in the context)

9 Uniform Information Density (UID) Optimal: On average each word adds the same amount of information to what we already know The rate of information transfer is close to the channel capacity Many constraints (grammar; learnability)

10 Uniform Information Density (UID) Efficient: Relative uniform information distribution where possible No continuous under- or overutilization of the channel ?

11 UID Definitions: Information density: Information per time (articulatory detail is left out) Choice: Subconscious (existence of different ways to encode the intended message)

12 UID Example:

13 UID

14 UID Goals: UID as a computational account of efficient sentence production Corpus-based studies are feasible and desirable  Corpus of spontaneous speech  Naturally distributed data

15 Data 7369 automatically extracted complement clauses (CC) from “Paraphrase Stanford- Edinburgh LINK Switchboard Corpus” (Penn Treebank) - 144 (2%) falsely extracted - 71 (1%) rare matrix verbs  extreme probabilities

16 Data Focus Actually: I(CC onset|context) = -log p(CC|context) + -log p(onset|context,CC) Here: I(CC|context) = -log p(CC|matrix verb lemma)

17 Data

18 Multilevel logit model Various factors (might) influence the outcome Ability to include several (control) parameters in one model Contribution of each can be estimated Why? Natural (uncontrolled) data

19 Controls Dependency Distance of matrix verb from CC onset  “THAT” o My boss thinks [I’m absolutely crazy.] o I agree with you [that, that a person’s heart can be changed.] Length of CC onset (including subject)  “THAT” Length of CC remainder

20 Short sidetrack Length of CC remainder Language production is incremental (+ heuristic complexity estimates?)

21 Controls Availability Lower speech rate  “THAT” preceding pause  “THAT” initial disfluency  “THAT”

22 Controls Availability Type of CC subject o It vs. I o Other PRO vs. above o Other NP vs. above Frequency of CC subject head Subject identity o Identical subject in matrix and CC  ≈ “NONE”

23 Controls Availability Word form similarity o Demonstrative pronoun “that” o Demonstrative determiner “that”  ≈ “NONE” Frequency of matrix verb o Higher frequency  “NONE”

24 Controls Ambiguity avoidance Possible garden path sentence  “THAT” o Even unlikely cases were included “I guess (this doesn’t really have to do with…)”

25 Controls Matrix Position of matrix verb o Further away from sentence-initial position  “THAT” Matrix subject o You vs. I  “THAT” o Other PRO vs. above  “THAT” o Other NP vs. above  “THAT”

26 Controls Others Random speakers intercept Persistence o Prime w/o that vs. no prime o Prime w/ that vs. above Gender o Male  “NONE”

27 Information density Clear significance (p <.0001) High information density of the CC onset  use of “that” Correlation with other predictors negligible Contribution to the model’s likelihood is high At least 15% of the model quality due to information density Single most important predictor

28 Information density Verbs’ subcategorization frequency as estimate for information density High CC-biases, low “that”-biases (e.g.: guess) Low CC-biases, high “that”-biases (e.g.: worry)  Syntactic reduction is affected by information density

29 Results

30 Information density Prediction: UID can account for any type of reduction Phonetic and phonological reduction So far patterns align with this prediction Availability account do not predict this o But predict lengthening of words

31 Information density Optional case markers (or copula) Languages with flexible word order Japanese ケーキが大好きだ vs. ケーキ大好き Keeki ga daisuki da keeki daisuki I love cake

32 Information density Reduced case markers Korean 나는 독일 사람이야 vs. 난 독일 사람이야 Na neun togil saram iya Nan togil saram iya I am German

33 Information density Optional object clitics and other argument marking morphology Direct object clitics in Bulgarian o Can’t be predicted by availability account o Could be predicted by ambiguity avoiding

34 Information density Contracted auxiliaries English “he’s” vs. “he is” o Can’t be predicted by neither availability nor ambiguity avoidance

35 Information density Ellipsis Japanese 行きたいけど行けない vs. 行きたいけど ikitai kedo ikenai ikitai kedo I want to go but (I can’t go) (¬ 行きたいけど ( 遅くなりそう ) I want to go, but I might be late) ikitai kedo osoku nari sou

36 Information density Non-subject-extracted relative clause Indefinite noun phrase < definite noun phrases Light head nouns < heavy head nouns (e.g. the way)(e.g. the priest) “I like the way (that) it vibrates”

37 Information density Whiz-deletion (BE) Relativizer + auxiliary can be ommitted “The smell (that is) released by a pig or a chicken farm is indescribable”

38 Information density Object drop Verbs with high selectional preference “Tom ate.” vs. “Tom saw …”

39 Information density Many novel predictions across o Different levels of linguistic productions o Languages o Types of alternations Per-word entropy of sentences should stay constant throughout discourse o Words with high information density (in the context and discourse) should come later in the sentence o A priori per-word entropy should increase

40 Grammaticalization Might interfere with information density? o Matrix subject “I” or “you” o Matrix verb “guess”, “think”, “say”, “know”, “mean” o Matrix verb in present tense o Matrix clause was not embedded  3033 cases remain Still highly significant (p <.0001)  UID may be a reason for grammaticalization

41 Noisy channel Base of UID Audience design o Speaker considers interlocutors’ knowledge and processor state to improve chance of successfully achieving their goal Modulating information density at choice points = rational strategy for efficient production UID minimizes processing difficulties

42 Corpus-based research Claim: “Lack of balance and heterogeneity of data make findings unreliable” Multilevel models Avoidance of redundant predictors If redundant residualization Inter-speaker variance + ecological valid

43 Corpus-based research Results easier extend to all of English Many previous results replicate Provides evidence for so far relatively understudied effects (e.g. similarity avoidance) “effect size” needs to be taken with caution o Not only strength of effects but also applicability Ambiguity avoidance (garden path sentences) relatively rare

44 Questions and discussion


Download ppt "Redundancy and reduction: Speakers manage syntactic information density Torsten Jachmann 16.05.2014 T. Florian Jaeger (2010) Seminar „Information Theoretic."

Similar presentations


Ads by Google