Presentation is loading. Please wait.

Presentation is loading. Please wait.

E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 Coordinate Structures: On the Relationship between Parsing Preferences and Corpus Frequencies Ilona Steiner.

Similar presentations


Presentation on theme: "E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 Coordinate Structures: On the Relationship between Parsing Preferences and Corpus Frequencies Ilona Steiner."— Presentation transcript:

1 E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 Coordinate Structures: On the Relationship between Parsing Preferences and Corpus Frequencies Ilona Steiner SFB 441, University of Tübingen Linguistic Evidence, 2-4 February 2006

2 E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 2 Overview Previous work on the relationship between parsing preferences and corpus frequencies Study by Gibson & Schütze (1999) Study by Mitchell & Brysbaert (1998) Reanalysis by Desmet et al. (2002) Corpus analysis I: Parallel-structure effect Corpus analysis II: Disambiguation preference NP vs. S coordination Discussion

3 E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 3 Tuning hypothesis (Cuetos et al., 1996) People prefer the most frequently occurring resolution of an ambiguity: Initial parsing preferences in syntactically ambigous sentences are determined by people's previous exposure to similar structures. → Parsing preferences and corpus frequencies should be correlated, i.e., the preferred construction should occur more frequently in corpora than the unpreferred construction

4 E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 4 Study by Gibson & Schütze (1999) Disambiguation preferences in English noun phrase conjunction: „NP 1 Prep NP 2 Prep NP 3 and NP 4 “ (1) The talkshow host told [ NP 1 : a joke] about [ NP 2 : a man] with [ NP 3 : an umbrella] and … Preference in experiments: NP 3 > NP 1 > NP 2 Preference in corpus data: NP 3 > NP 2 > NP 1 → No correlation Conclusion: „…the sentence comprehension mechanism is not using corpus frequencies in arriving at its preference in this ambiguity and hence the decision principles of sentence comprehension and sentence production must be partially distinct.“

5 E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 5 Study by Mitchell & Brysbaert (1998) Attachment preference in Dutch: „NP 1 Prep NP 2 Relative Clause“ (2) Someone shot [NP1: the servant] of [NP2: the actress] [RC: who was on the balcony] Preference in experiments: NP 1 > NP 2 Preference in corpus data:NP 2 > NP 1 → No correlation Were the experimental stimuli representative for the sentences in the corpus?

6 E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 6 Desmet et al. (2002): Reanalysis of Mitchell & Brysbaert's (1998) corpus „NP 1 of NP 2 RC“ Experimental stimuli: both NPs referring to humans Corpus data: very few sentences with both NPs referring to humans Preference depends on nature of NP 1 : NP 1 / human: NP 1 preference NP 1 / non-human: NP 2 preference Sentence completion studies with corpus sentences confirmed this pattern → correlation! → If corpus data are controlled carefully to match the experimental sentences, the discrepancy between sentence comprehension and production disappears

7 E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 7 Aims Relationship between two types of linguistic evidence with respect to coordinate structures in English: Parsing preferences Corpus frequencies Focus on two processing effects that have been found in reading-time studies: Parallel-structure effect Disambiguation preference in noun phrase vs. sentence coordination Comparison to corpus data in English Verbmobil treebank (TüBa-E, Hinrichs et al. (2000))

8 E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 8 English Verbmobil Treebank (TüBa-E) Spoken dialogs in the domain of business appointments Annotated (manually) at the levels of Morpho-syntax (parts-of-speech categories) Syntactic phrase structure Function-argument structure Unedited naturally occurring dialogs: data reflect mechanisms of sentence production, and not factors that are due to editing processes

9 E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 9 Parallel-structure effect In a coordinate structure, the second conjunct is read faster when it is structurally similar to the first one. (Frazier, Munn & Clifton, 2000) (3) Terry wrote [ NP: a long novel] and [ NP: a short poem]. (4) Terry wrote [ NP: a novel] and [ NP: a short poem]. → [ NP: a short poem] faster in (3) than in (4) Preference for structural similarity in coordination also present in corpora?

10 E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 10 Coordination dataset Analysis of ca. 3000 sentences (CD13) of TüBa-E Coordination within complete sentences, conjunction: and → 274 occurrences of coordinate structures For each coordinate structure extraction of Syntactic category of mother and daughters Grammatical function of mother Number of conjuncts Length of conjuncts (number of words) Degree of redundancy in conjuncts (0 - 100%) Syntactic annotation in the treebank used as basis for the extraction process

11 E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 11 Corpus analysis I: Degree of redundancy Manual inspection of degree of redundancy for each coordinate structure (with 2 conjuncts) 100%: both conjuncts have exactly the same structure including PoS-tags 0%: not a single node is redundant

12 E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 12 Example sentence with 36% redundancy

13 E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 13 Example sentence with 36% redundancy

14 E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 14 Example sentence with 100% redundancy

15 E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 15 Example sentence with 100% redundancy

16 E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 16 Redundancy in coordination dataset Distribution of redundancy from 0% to 100% 36% of all coordinate structures contained conjuncts with identical structure (100%) How often does structural similarity occur randomly? Degree of redundancy No. of occurrences

17 E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 17 Redundancy in coordination dataset Distribution of redundancy from 0% to 100% 36% of all coordinate structures contained conjuncts with identical structure (100%) How often does structural similarity occur randomly? Degree of redundancy No. of occurrences

18 E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 18 Random dataset For each first conjunct in coordination dataset: extraction of random second 'conjunct' (randomly chosen from the corpus, independent of coordination) (5)[ NP: Twenty second] and [ NP: twenty fourth] are pretty bad.“ Randomly chosen phrase: [ NP: the first] Random phrase matches original second conjunct in syntactic category (here: NP) grammatical function (here: SBJ) length (here: 2 words) How many occurrences in random dataset are structurally identical?

19 E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 19 Results: Parallel-structure effect Structural similarity: 13% in random dataset (pairs of original first and random second conjunct) 36% in coordination dataset Difference is highly significant ( χ ² (1) = 72.1; p < 0.001 ) → Structural similarity within coordination is significantly more frequent in corpus data than structural similarity of two phrases independent of coordination. → Corpus data match preference for structural similarity during parsing

20 E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 20 Distribution of parallel occurrences Length effect: the shorter the conjuncts, the more frequently structural similarity occurs (F(1,3) = 6.63; p < 0.05 for length 1 vs. 2) Length of conjuncts % Parallel conjuncts

21 E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 21 Disambiguation preference: NP vs. S coordination Reading-time experiments (Frazier, 1979) : (6) a. Peter kissed [ NP: Mary] and [ NP: her sister] too b. [ S: Peter kissed Mary] and [ S: her sister laughed] Local ambiguity that cannot be resolved prior to the last word Garden-path effect at “laughed“ in (6b) compared to “too“ in (6a) Preference to interpret [her sister] as part of a conjoined NP, and not as the beginning of a new sentence Is the preference for NP coordination (compared to S coordination) also present in corpora?

22 E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 22 Corpus analysis II: NP vs. S coordination Analysis of CD6 (2554 sentences) and CD13 (2906 sentences) of TüBa-E Extraction of all occurrences with the form: “NP Subj …Verb…NP Compl and…“ Continuation with NP and S coordination should both be semantically possible (7) a. and I will bring [ NP: the doughnuts] and [ NP: coffee] I guess b. and [ S: Friday I have a nine to ten meeting] and [ S: I also have a meeting in the early afternoon] *c. I hate to mix [ NP: business] and [ NP: weekends]

23 E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 23 Results: NP vs. S coordination “NP Subj …Verb…NP Compl and…“ Coordination with NP: 69% (n = 27) Coordination with S:31% (n = 12) [ t(38) = 2.57; p < 0.05 ] → Corpus frequencies match preference for NP coordination (compared to S coordination) during parsing

24 E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 24 Discussion 1 Correlation between parsing preferences and corpus frequencies for Parallel-structure effect Disambiguation preference NP vs. S coordination Possible explanations: “Tuning hypothesis“: statistical pattern in natural language causes development of parsing preferences Common source for language production and comprehension

25 E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 25 Discussion 2 If there is a common source, structure-based accounts would explain parsing preferences Parallel-structure effect: Recycling mechanism (Steiner, 2005) Preference for NP coordination: “Minimal Attachment Principle“ (Frazier, 1979), “Recency Preference“ (Gibson et al., 1996) Preferred construction is more economical, and is easier and faster to process Constructions that are easier to understand would also be easier to produce, and are produced more often → Common mechanism for language production and comprehension leads to faster reading times during parsing and to higher frequencies during production

26 E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 26 Discussion 3 With the present study we are not able to differentiate between “tuning hypothesis“ and a possible common source. We showed that production and comprehension are closer to each other than expected. If correlation between parsing preferences and corpus frequencies holds in general: → corpus data can be used to evaluate (at least qualitatively) models of sentence processing


Download ppt "E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 Coordinate Structures: On the Relationship between Parsing Preferences and Corpus Frequencies Ilona Steiner."

Similar presentations


Ads by Google