Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical, and Computational Perspectives University of Tübingen, Thomas Hoffmann (University of Regensburg)
1. Introduction: Corpus vs. Introspection We do not need to use intuition in justifying our grammars, and as scientists, we must not use intuition in this way. (Sampson 2001: 135) You don’t take a corpus, you ask questions. […] You can take as many texts as you like, you can take tape recordings, but you’ll never get the answer. (Chomsky in Aarts 2000: 5-6) Which type of data are we left with then?
1. Introduction: Corpus vs. Introspection A corpus and an introspection-based approach to linguistics […] can be gainfully viewed as being complementary. (McEnery and Wilson 1996: 16) corpus and introspection data = corroborating evidence case study: P placement in English Relative clauses
1. Introduction: What to Expect 1.corpora vs. introspection? 2.categorical corpus data (ICE-GB corpus) 3.Magnitude Estimation experiment 4.variable corpus data (ICE-GB corpus) 5.conclusion
2. Corpora and Introspection Arguments against corpus data: “performance” problem: “negative data” problem: “homogeneity” problem: “only use introspection”
2. Corpora and Introspection Arguments against corpus data: no corpus “performance” problem: yet:performance result of competence modern corpora representative “negative data” problem: yet:only additional (different) data needed “homogeneity” problem: yet:empirical claim that needs to be investigated use corpora + additional data type
2. Corpora and Introspection Arguments against introspection data: “unnatural data” problem: “irrefutable data” problem: “illusion” problem: “stability” problem: “only use corpora”
2. Corpora and Introspection Arguments against introspection data: no introspection “unnatural data” problem: yet: only additional (context) data needed “irrefutable data”: yet:depends only on collection method “illusion” problem: yet:only additional (natural) data needed “stability” problem: yet:empirical claim that needs to be investigated use corpora + additional data type
2. Corpora and Introspection Corpora and introspection are corroborating evidence: = weaknesses of corpus data = weaknesses of introspection data +ungrammaticality+unexpected patterns +negative data+contextual factors +rare phenomena+natural language introspectioncorpus
3. Case Study: Preposition Placement I want a data source... (1)a. which I can rely on [stranded preposition] b.on which I can rely [pied-piped preposition] driving question: data source for empirical analysis of (1a,b)?
4. Empirical Study I: Corpus Data Corpus used: International Corpus of English ICE-GB (Nelson et al. 2002) (educated Present-day BE, written & spoken) Analysis tool: GOLDVARB computer programme (logistic regression; Robinson et al. 2001) relative influence of various contextual factors (weights: 0.5 = favouring)
P strand/pied-piped token tested for 1.finiteness 2.restrictiveness 3.relativizer 4.XP contained in (V / N, e.g. entrance to sth. / Adj, e.g. afraid of sth.) 5.level of formality 6.X-PP relationship (V prepositional, PP Loc_Adjunct, PP Man_Adjunct …) except 2: all factors discussed in literature before, but not w.r.t. interdependence (e.g. Bergh, G. & A. Seppänen. 2000; Trotta 2000) 4. Empirical Study I: Corpus Data I
raw ICE-GB P-placement data: 1074 finite relative clauses 659 (61.4%) tokens: pied piped 415 (38.6%) tokens: stranded as expected: many categorical effects accidental vs. systematic gaps? 4.1 Categorical corpus data
1.relativizer: all that/Ø-tokens in ICE-GB stranded 176 that+P stranded -token (2) a data source on that I can rely 177 Ø+P stranded -token (3) a data source on Ø I can rely ICE-GB result: expected implications: (2) = (3)? / that WH- 4.2 Categorical corpus data: that/Ø ≠ WH-relatives
2.X-PP relationship: Literature (e.g. Bergh, G. & A. Seppänen. 2000; Trotta 2000): P stranding favoured with complement PP disfavoured with adjunct PP ICE-GB data: P stranding restricted to PPs which add thematic information to predicates/events 4.3 Categorical corpus data: Constraints on P strand
2.X-PP relationship: categorical effect of WH-PP Adjuncts -tokens: a)just P+WH / no that/Ø+P in ICE-GB: manner, degree, frequency & respect PPs, e.g.: (4)a. the ways in which the satire is achieved b. the ways which/that/Ø the satire is achieved in 4.3 Categorical corpus data: Constraints on P strand
2.X-PP relationship: categorical effect of WH-PP Adjuncts -tokens: b)just P+WH / but that/Ø+P in ICE-GB: subcat. PP (put sth. in/into/under) & locative, affected loc., direction PP adjuncts (5)a. … the world that I was working in and studying in b. … the world in which I was working and studying 4.3 Categorical corpus data: Constraints on P strand
Claim: comparison of WH- vs that/Ø shows: P can only be stranded if: PP adds thematic information to predicates/events manner & degree adjuncts: compare events “to other possible events of V-ing” (Ernst 2002: 59) frequency & respect adjuncts: have scope over temporal information (frequency) and truth value of entire clause (respect) don’t add thematic participant P strand with these: systematic gap 4.3 Categorical corpus data: Constraints on P strand
Claim: comparison of WH- vs that/Ø shows: P can only be stranded if: PP adds thematic information to predicates/events subcat. PP & loc., affected loc., direction PP adjuncts: add thematic participant WH+P with these: accidental gap 4.3 Categorical corpus data: Constraints on P strand
Claim: comparison of WH- vs that/Ø shows: P can only be stranded if: PP adds thematic information to predicates/events Comparison of WH- vs that/Ø good evidence, but: still “negative data” problem further corroborating evidence needed Introspection: Magnitude Estimation study 4.3 Categorical corpus data: Constraints on P strand
relative judgements (reference sentence) informal, restrictive RCs tested for: P-PLACEMENT(P strand, P pied-piped ) RELATIVIZER (WH-, that-, Ø-) X-PP (V Prep, PP Temp/Loc_Adjunct, PP Manner/Degree_Adjunct ) tokens counterbalanced: 6 material groups a 18 tokens + 36 filler = 54 tokens tokens randomized (Web-Exp-software) N = 36 BE native speakers (sex: 18m, 18f / age: 17-64) 5. Empirical Study II: Magnitude Estimation
18 filler sentences: ungrammatical a.That’s a tape I sent them that done I’ve myself (word order violation; original source: ) b.There was lots of activity that goes on there (subject contact clause; original source: ) c.There are so many people who needs physiotherapy (subject-verb agreement error; original source: ) 5. Empirical Study II: Magnitude Estimation
ANOVA: significant effects P-PLACEMENT: F(1,33) = 4.536, p < 0.05 RELATIVIZER: F(2,66) = , p < P-PLACEMENT*X-PP: F(2,66) = 9.740, p < P-PLACEMENT*RELATIVIZER: F(2,66) = 4.217, p < Empirical Study II: Magnitude Estimation
ANOVA: not significant AGE: F(1,33) = 2.760, p > 0.10 GENDER: F(1,33) = 1.495, p > 0.20 indicates: homogeneity of subjects 5. Empirical Study II: Magnitude Estimation
Post-hoc Tukey test: P-Place*Relativizer P pied-piped : WH- >>that[p > [p [p < 0.010] P strand : no difference: WH- = that = [p >> 0.100] 5. Empirical Study II: Magnitude Estimation
Post-hoc Tukey test: P-Place*X-PP P pied-piped : PP Man/Deg > V Prep [p 0.100] P strand : no difference: V Prep > PP Temp/Loc > PP Man/Deg [p < 0.001] 5. Empirical Study II: Magnitude Estimation
Fig. 1: Magnitude estimation result for P + relativizer P+WH >> P+that > P+Ø
Fig. 2: Magnitude estimation result for P + relativizer compared with fillers P+that & P+Ø = ungrammatical fillers violation of “hard constraint” (Sorace & Keller 2005)
Fig. 3: Magnitude estimation result for relativizer + P WH + P= that + P = Ø + P V Prep > PP Temp/Loc > PP Man/Deg
Fig. 3: Magnitude estimation result for relativizer + P V Prep > PP Temp/Loc > PP Man/Deg >> ungrammatical filler violation of “soft constraint” (Sorace & Keller 2005)
6. Corroborating Evidence Corroborating evidence: corpus: man/deg PPs: no P stranded (not even with that/ ) semantic constraint on P stranded experiment: man/deg PPs worst environment for P stranded yet:better than ungrammatical fillers (soft constraint violation)
Constraints on variable corpus data (354 finite WH-token): Goldvarb identified 3 independent factors: (Log likelihood = Significance = 0.004; Fit: X-square(27) = , accepted, p = ) 1. level of formality (as expected) 2.type of PP contained in (as expected) 3.restrictiveness (unexpected): restrictive RC favour pied piping: (weight: 0.592) nonrestrictive RC clearly inhibit pied piping (i.e. favour stranding; weight: 0.248) 7. Empirical Study III: Corpus Data II
(6) And uhm he left me there with this packet of Durex which I hadn't got a clue what to do **[with]** to be totally honest reasons for restrictiveness effect: 1. weaker semantic ties of non-restrictive clause with antecedent (pause/comma) 2. Pied-piped P receives connective function functionalisation of preposition placement in WH-relative clause 7. Empirical Study III: Corpus Data II
corpus and introspection data = corroborating evidence: corpora: frequency/context effects (e.g. level of formality) unexpected patterns (e.g. restrictiveness) categorical data require further investigation introspection: differentiation of accidental gaps (WH+P with PP Temp/Loc ) systematic gaps (X+P with PP Man/Deg ) detection of degrees of ungrammaticality 8. Conclusion
9. References Aarts, B "Corpus linguistics, Chomsky and Fuzzy Tree Fragments". In Christian Mair and Marianne Hundt, eds Corpus Linguistics and Linguistic Theory. Amsterdam and Atlanta, GA: Rodopi, Bard, E.G. et al “Magnitude Estimation of Linguistic acceptability”. Language 72: Bergh, G. & A. Seppänen “Preposition stranding with wh-relatives: A historical survey”. English Language and Linguistics 4: Cowart, W Experimental Syntax: Applying Objective Methods to Sentence Judgements. Thousand Oaks: Sage. Huddleston, R. et al “Relative constructions and unbound dependencies”. In: G.K. Pullum & R. Huddleston, eds. The Cambridge Grammar of the English Language. Cambridge: Cambridge University Press, Jackendoff, R Foundations of Language: Brain, Meaning, Grammar, Evolution. Oxford: Oxford University Press. Levine, R. & I.A. Sag “WH-Nonmovement”.,
9. References Nelson, G. et al Exploring Natural Language: Working with the British Component of the International Corpus of English. Amsterdam, Philadelphia: Benjamins. McEnery, T. and A. Wilson Corpus Linguistics. Edinburgh: Edinburgh University Press. Pesetsky, D “Some principles of sentence production”. In: Pilar Barbosa et al., eds. Is the Best Good Enough? Optimality and Competition in Syntax. Cambridge, MA: MIT Press, Penke, M. & A. Rosenbach "What counts as evidence in linguistics? An introduction". Studies in Language 28,3: Pickering, M. & G. Barry “Sentence processing without empty categories”. Language and Cognitive Processes 6: Quirk, R. et al A Comprehensive Grammar of the English Language. London: Longman. Robinson, J. et al “GOLDVARB 2001: A Multivariate Analysis Application for Windows”.
9. References Sag, I.A “English relative constructions”. Journal of Linguistics 33: Sampson, G Empirical Linguistics. London, New York: Continuum. Schütze, Carson T The Empirical Base of Linguistics: Grammaticality Judgements and Linguistic Methodology. Chicago: Chicago University Press. Sorace, Antonella and Frank Keller "Gradience in linguistic data". Lingua 115,11: Trotta, J Wh-clauses in English: Aspects of Theory and Description. Amsterdam and Philadelphia, GA: Rodopi. Van der Auwera, J “Relative that — a centennial dispute”. Journal of Linguistics 21: