Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical, and Computational Perspectives University of Tübingen, 02.02.-04.02.2006 Thomas Hoffmann (University of Regensburg)

1. Introduction: Corpus vs. Introspection We do not need to use intuition in justifying our grammars, and as scientists, we must not use intuition in this way. (Sampson 2001: 135) You don’t take a corpus, you ask questions. […] You can take as many texts as you like, you can take tape recordings, but you’ll never get the answer. (Chomsky in Aarts 2000: 5-6)  Which type of data are we left with then?

1. Introduction: Corpus vs. Introspection A corpus and an introspection-based approach to linguistics […] can be gainfully viewed as being complementary. (McEnery and Wilson 1996: 16)  corpus and introspection data = corroborating evidence  case study: P placement in English Relative clauses

1. Introduction: What to Expect 1.corpora vs. introspection? 2.categorical corpus data (ICE-GB corpus) 3.Magnitude Estimation experiment 4.variable corpus data (ICE-GB corpus) 5.conclusion

2. Corpora and Introspection Arguments against corpus data: “performance” problem: “negative data” problem: “homogeneity” problem:  “only use introspection”

2. Corpora and Introspection Arguments against corpus data:  no corpus “performance” problem: yet:performance result of competence modern corpora representative “negative data” problem: yet:only additional (different) data needed “homogeneity” problem: yet:empirical claim that needs to be investigated  use corpora + additional data type

2. Corpora and Introspection Arguments against introspection data: “unnatural data” problem: “irrefutable data” problem: “illusion” problem: “stability” problem:  “only use corpora”

2. Corpora and Introspection Arguments against introspection data:  no introspection “unnatural data” problem: yet: only additional (context) data needed “irrefutable data”: yet:depends only on collection method “illusion” problem: yet:only additional (natural) data needed “stability” problem: yet:empirical claim that needs to be investigated  use corpora + additional data type

2. Corpora and Introspection Corpora and introspection are corroborating evidence: = weaknesses of corpus data = weaknesses of introspection data +ungrammaticality+unexpected patterns +negative data+contextual factors +rare phenomena+natural language introspectioncorpus

3. Case Study: Preposition Placement I want a data source... (1)a. which I can rely on [stranded preposition] b.on which I can rely [pied-piped preposition] driving question: data source for empirical analysis of (1a,b)?

4. Empirical Study I: Corpus Data Corpus used: International Corpus of English ICE-GB (Nelson et al. 2002) (educated Present-day BE, written & spoken) Analysis tool: GOLDVARB computer programme (logistic regression; Robinson et al. 2001) relative influence of various contextual factors (weights: 0.5 = favouring)

P strand/pied-piped token tested for 1.finiteness 2.restrictiveness 3.relativizer 4.XP contained in (V / N, e.g. entrance to sth. / Adj, e.g. afraid of sth.) 5.level of formality 6.X-PP relationship (V prepositional, PP Loc_Adjunct, PP Man_Adjunct …) except 2: all factors discussed in literature before, but not w.r.t. interdependence (e.g. Bergh, G. & A. Seppänen. 2000; Trotta 2000) 4. Empirical Study I: Corpus Data I

raw ICE-GB P-placement data: 1074 finite relative clauses 659 (61.4%) tokens: pied piped 415 (38.6%) tokens: stranded as expected: many categorical effects  accidental vs. systematic gaps? 4.1 Categorical corpus data

1.relativizer: all that/Ø-tokens in ICE-GB stranded 176 that+P stranded -token (2)  a data source on that I can rely 177 Ø+P stranded -token (3)  a data source on Ø I can rely  ICE-GB result: expected  implications: (2) = (3)? / that  WH- 4.2 Categorical corpus data: that/Ø ≠ WH-relatives

2.X-PP relationship: Literature (e.g. Bergh, G. & A. Seppänen. 2000; Trotta 2000): P stranding favoured with complement PP disfavoured with adjunct PP ICE-GB data: P stranding restricted to PPs which add thematic information to predicates/events 4.3 Categorical corpus data: Constraints on P strand

2.X-PP relationship: categorical effect of WH-PP Adjuncts -tokens: a)just P+WH / no that/Ø+P in ICE-GB: manner, degree, frequency & respect PPs, e.g.: (4)a. the ways in which the satire is achieved b.  the ways which/that/Ø the satire is achieved in 4.3 Categorical corpus data: Constraints on P strand

2.X-PP relationship: categorical effect of WH-PP Adjuncts -tokens: b)just P+WH / but that/Ø+P in ICE-GB: subcat. PP (put sth. in/into/under) & locative, affected loc., direction PP adjuncts (5)a. … the world that I was working in and studying in b. … the world in which I was working and studying 4.3 Categorical corpus data: Constraints on P strand

Claim: comparison of WH- vs that/Ø shows: P can only be stranded if: PP adds thematic information to predicates/events manner & degree adjuncts: compare events “to other possible events of V-ing” (Ernst 2002: 59) frequency & respect adjuncts: have scope over temporal information (frequency) and truth value of entire clause (respect)  don’t add thematic participant  P strand with these: systematic gap 4.3 Categorical corpus data: Constraints on P strand

Claim: comparison of WH- vs that/Ø shows: P can only be stranded if: PP adds thematic information to predicates/events subcat. PP & loc., affected loc., direction PP adjuncts:  add thematic participant  WH+P with these: accidental gap 4.3 Categorical corpus data: Constraints on P strand

Claim: comparison of WH- vs that/Ø shows: P can only be stranded if: PP adds thematic information to predicates/events Comparison of WH- vs that/Ø good evidence, but: still “negative data” problem  further corroborating evidence needed  Introspection: Magnitude Estimation study 4.3 Categorical corpus data: Constraints on P strand

relative judgements (reference sentence) informal, restrictive RCs tested for: P-PLACEMENT(P strand, P pied-piped ) RELATIVIZER (WH-, that-, Ø-) X-PP (V Prep, PP Temp/Loc_Adjunct, PP Manner/Degree_Adjunct ) tokens counterbalanced: 6 material groups a 18 tokens + 36 filler = 54 tokens tokens randomized (Web-Exp-software) N = 36 BE native speakers (sex: 18m, 18f / age: 17-64) 5. Empirical Study II: Magnitude Estimation

18 filler sentences: ungrammatical a.That’s a tape I sent them that done I’ve myself (word order violation; original source: ) b.There was lots of activity that goes on there (subject contact clause; original source: ) c.There are so many people who needs physiotherapy (subject-verb agreement error; original source: ) 5. Empirical Study II: Magnitude Estimation

ANOVA: significant effects P-PLACEMENT: F(1,33) = 4.536, p < 0.05 RELATIVIZER: F(2,66) = 17.149, p < 0.001 P-PLACEMENT*X-PP: F(2,66) = 9.740, p < 0.001 P-PLACEMENT*RELATIVIZER: F(2,66) = 4.217, p < 0.02 5. Empirical Study II: Magnitude Estimation

ANOVA: not significant AGE: F(1,33) = 2.760, p > 0.10 GENDER: F(1,33) = 1.495, p > 0.20  indicates: homogeneity of subjects 5. Empirical Study II: Magnitude Estimation

Post-hoc Tukey test: P-Place*Relativizer P pied-piped : WH- >>that[p >  [p  [p < 0.010] P strand : no difference: WH- = that =  [p >> 0.100] 5. Empirical Study II: Magnitude Estimation

Post-hoc Tukey test: P-Place*X-PP P pied-piped : PP Man/Deg > V Prep [p 0.100] P strand : no difference: V Prep > PP Temp/Loc > PP Man/Deg [p < 0.001] 5. Empirical Study II: Magnitude Estimation

Fig. 1: Magnitude estimation result for P + relativizer P+WH >> P+that > P+Ø

Fig. 2: Magnitude estimation result for P + relativizer compared with fillers P+that & P+Ø = ungrammatical fillers  violation of “hard constraint” (Sorace & Keller 2005)

Fig. 3: Magnitude estimation result for relativizer + P WH + P= that + P = Ø + P V Prep > PP Temp/Loc > PP Man/Deg

Fig. 3: Magnitude estimation result for relativizer + P V Prep > PP Temp/Loc > PP Man/Deg >> ungrammatical filler  violation of “soft constraint” (Sorace & Keller 2005)

6. Corroborating Evidence Corroborating evidence: corpus: man/deg PPs: no P stranded (not even with that/  )  semantic constraint on P stranded experiment: man/deg PPs worst environment for P stranded yet:better than ungrammatical fillers (soft constraint violation)

Constraints on variable corpus data (354 finite WH-token): Goldvarb identified 3 independent factors: (Log likelihood = -88.437 Significance = 0.004; Fit: X-square(27) = 27.977, accepted, p = 0.2040) 1. level of formality (as expected) 2.type of PP contained in (as expected) 3.restrictiveness (unexpected): restrictive RC favour pied piping: (weight: 0.592) nonrestrictive RC clearly inhibit pied piping (i.e. favour stranding; weight: 0.248) 7. Empirical Study III: Corpus Data II

(6) And uhm he left me there with this packet of Durex which I hadn't got a clue what to do **[with]** to be totally honest reasons for restrictiveness effect: 1. weaker semantic ties of non-restrictive clause with antecedent (pause/comma) 2. Pied-piped P receives connective function  functionalisation of preposition placement in WH-relative clause 7. Empirical Study III: Corpus Data II

corpus and introspection data = corroborating evidence: corpora: frequency/context effects (e.g. level of formality) unexpected patterns (e.g. restrictiveness) categorical data  require further investigation  introspection: differentiation of accidental gaps (WH+P with PP Temp/Loc ) systematic gaps (X+P with PP Man/Deg ) detection of degrees of ungrammaticality 8. Conclusion

9. References Aarts, B. 2000. "Corpus linguistics, Chomsky and Fuzzy Tree Fragments". In Christian Mair and Marianne Hundt, eds. 2000. Corpus Linguistics and Linguistic Theory. Amsterdam and Atlanta, GA: Rodopi, 5-13. Bard, E.G. et al. 1996. “Magnitude Estimation of Linguistic acceptability”. Language 72:32-68. Bergh, G. & A. Seppänen. 2000. “Preposition stranding with wh-relatives: A historical survey”. English Language and Linguistics 4:295-316. Cowart, W. 1997. Experimental Syntax: Applying Objective Methods to Sentence Judgements. Thousand Oaks: Sage. Huddleston, R. et al. 2002. “Relative constructions and unbound dependencies”. In: G.K. Pullum & R. Huddleston, eds. The Cambridge Grammar of the English Language. Cambridge: Cambridge University Press, 1031-1096. Jackendoff, R. 2002. Foundations of Language: Brain, Meaning, Grammar, Evolution. Oxford: Oxford University Press. Levine, R. & I.A. Sag. 2003. “WH-Nonmovement”., 04.07.2004.

9. References Nelson, G. et al. 2002. Exploring Natural Language: Working with the British Component of the International Corpus of English. Amsterdam, Philadelphia: Benjamins. McEnery, T. and A. Wilson. 1997. Corpus Linguistics. Edinburgh: Edinburgh University Press. Pesetsky, D. 1998. “Some principles of sentence production”. In: Pilar Barbosa et al., eds. Is the Best Good Enough? Optimality and Competition in Syntax. Cambridge, MA: MIT Press, 337-83. Penke, M. & A. Rosenbach. 2004. "What counts as evidence in linguistics? An introduction". Studies in Language 28,3: 480-526. Pickering, M. & G. Barry. 1991. “Sentence processing without empty categories”. Language and Cognitive Processes 6:229-259. Quirk, R. et al. 1985. A Comprehensive Grammar of the English Language. London: Longman. Robinson, J. et al. 2001. “GOLDVARB 2001: A Multivariate Analysis Application for Windows”.

9. References Sag, I.A. 1997. “English relative constructions”. Journal of Linguistics 33:431-484. Sampson, G. 2001. Empirical Linguistics. London, New York: Continuum. Schütze, Carson T. 1996. The Empirical Base of Linguistics: Grammaticality Judgements and Linguistic Methodology. Chicago: Chicago University Press. Sorace, Antonella and Frank Keller. 2005. "Gradience in linguistic data". Lingua 115,11: 1497-1525. Trotta, J. 2000. Wh-clauses in English: Aspects of Theory and Description. Amsterdam and Philadelphia, GA: Rodopi. Van der Auwera, J. 1985. “Relative that — a centennial dispute”. Journal of Linguistics 21:149-179.

Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

Similar presentations

Presentation on theme: "Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

Similar presentations

Presentation on theme: "Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,"— Presentation transcript:

Similar presentations

About project

Feedback