Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical,

Slides:

Advertisements

Similar presentations

Corpora in grammatical studies

Advertisements

Diachronic study and language change Corpus Linguistics Richard Xiao

Uses of a Corpus “[E]xplore actual patterns of language use”

Intervention by gaps in online sentence processing Michael Frazier, Peter Baumann, Lauren Ackerman, David Potter, Masaya Yoshida Northwestern University.

Grammar-Based Accounts of Superiority The acceptability difference between (1) and (2) has been attributed to a syntactic constraint, Superiority (Chomsky,

Preposition Stranding in British English: ?Up with how much constraints do you have to put? CoGETI Forschungsnetzwerk Constraintbasierte Grammatik: Non-Canonical.

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora.

LIN 540G Second Language Acquistion

The Relationship Between Second Language Acquisition Theory and Computer-Assisted Language Learning Chapelle, C. A. (2009). The Relationship Between Second.

Recent Developments in Technological Tools for the Purpose of Facilitating SLA.

Standards for Qualitative Research in Education

The Subjunctive in Spoken British English ICAME, Lancaster, 28 th May Jo Close & Bas Aarts, UCL

January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.

August 23, 2010 Grammars and Lexicons How do linguists study grammar?

Specifying a Purpose, Research Questions or Hypothesis

LELA English Corpus Linguistics

Young Children Learn a Native English Anat Ninio The Hebrew University, Jerusalem 2010 Conference of Human Development, Fordham University, New York Background:

C SC 620 Advanced Topics in Natural Language Processing 3/9 Lecture 14.

Copyright © 2013 Wolters Kluwer Health | Lippincott Williams & Wilkins Statistical Methods for Health Care Research Chapter 1 Using Research and Statistics.

Semantics and Lexicology Generativist semantics. From structuralist semantics Semantic features, components.

Models of Generative Grammar Smriti Singh. Generative Grammar  A Generative Grammar is a set of formal rules that can generate an infinite set of sentences.

Lecture 1 Introduction: Linguistic Theory and Theories

1. Introduction Which rules to describe Form and Function Type versus Token 2 Discourse Grammar Appreciation.

Science and Engineering Practices

TOPIC 2: Some Basic Concepts

Traditional Grammar Vs Linguistics

EXPERIMENT 2 [4] CW- inconsistent If cats were vegetarians they would be cheaper for owners to look after. Families could feed their cat a bowl of |fish.

The ‘London Corpora’ projects - the benefits of hindsight - some lessons for diachronic corpus design Sean Wallis Survey of English Usage University College.

A Contrastive Study of English Native Speaker's and Chinese Learner's Use of Existential Construction Tian Ma.

The IMRaD Structure Dr. Lam TECM Why is this important? Your project, duh Consumers of research You form opinions based on research (whether you.

McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)

Jelena Mirković and Maryellen C. MacDonald Language and Cognitive Neuroscience Lab, University of Wisconsin-Madison Introduction How to Study Subject-Verb.

MA in English Linguistics Experimental design and statistics Sean Wallis Survey of English Usage University College London

Forum for Research on the Languages of Scotland and Ulster August 2015 Introducing FITS Rhona Alcorn Angus McIntosh Centre for Historical Linguistics.

Corpus Linguistics Lecture 1 Albert Gatt. Contact details  My  Drop me a line with queries etc, and.

Researching language with computers Paul Thompson.

Introduction To Biological Research. Step-by-step analysis of biological data The statistical analysis of a biological experiment may be broken down into.

What readings does a given sentence have? Some sentences containing a quantifier and negation are semantically ambiguous. They reveal two readings: Alle.

CSD 5100 Introduction to Research Methods in CSD Observation and Data Collection in CSD Research Strategies Measurement Issues.

Dr. Monira Al-Mohizea MORPHOLOGY & SYNTAX WEEK 12.

Learning Progressions: Some Thoughts About What we do With and About Them Jim Pellegrino University of Illinois at Chicago.

Implementation and process evaluation: developing our approach Ann Lendrum University of Manchester Neil Humphrey University of Manchester Gemma Moss Institute.

Experimental Research Methods in Language Learning Chapter 16 Experimental Research Proposals.

Linguistics in English Language Degrees Wim van der Wurff (Newcastle) What is an English Language Degree? Sheffield, 19 March 2010.

Grammatical Noriegas interaction in corpora and treebanks ICAME 30 Lancaster May 2009 Sean Wallis Survey of English Usage University College London.

The Scientific Method. Steps of Scientific Method 1.Observation: notice and describe events or processes 2.Make a question 1.Relate to observation 2.Should.

Eloise Forster, Ed.D. Foundation for Educational Administration (FEA)

Capturing patterns of linguistic interaction in a parsed corpus A methodological case study Sean Wallis Survey of English Usage University College London.

Learning Outcomes Identify the types of hypotheses Identify Characteristics of a good hypothesis Identify the types of hypotheses Formulate a valid hypothesis.

Workshop: Corpus (1) What might a corpus of spoken data tell us about language? OLINCO 2014 Olomouc, Czech Republic, June 7 Sean Wallis Survey of English.

1 And yeah, it was really good! Positive stance in native and learner speech Sylive De Cock Centre for English Corpus Linguistics Université catholique.

RESEARCH DESIGN & CORPUS COMPILATION. Corpus design is intrinsic and a fundamental part of the analysis. It is guided by the RQ and affects the results.

Introduction Chapter 1 Foundations of statistical natural language processing.

What do we mean by Syntax? Unit 6 – Presentation 1 “the order or arrangement of words within a sentence” And what is a ‘sentence’? A group of words that.

Coreferential Interpretations of Reflexives in Picture Noun Phrases: an Experimental Approach Micah Goldwater University of Texas at Austin Jeffrey T.

What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.

Basic Concepts in Research According to : Jacinta Karen Juin P71697 For GGGB6013 KAEDAH PENYELIDIKAN 1 (TASK 1)

Introducing Sociolinguistics Dr. Emma Moore

Labov’s Principles—1972 Language in Society, Vol.1 No. 1 “ Principles ” 1.Cumulative Principle 2.The Neogrammarian Hypothesis 3.The Uniformitarian Principle.

Text Linguistics. Definition of linguistics Linguistics can be defined as the scientific or systematic study of language. It is a science in the sense.

E303 Part II The Context of Language Research

An Introduction to Linguistics

Linguistics Linguistics can be defined as the scientific or systematic study of language. It is a science in the sense that it scientifically studies the.

Verb Activation through Priming at the Syntax-Semantics Interface

Corpus-Based ELT CEL Symposium Creating Learning Designers

Lexico-grammar: From simple counts to complex models

Title of your experimental design

Traditional Grammar VS. Generative Grammar

The 7Cs: A Pedagogical Framework for Grammar Teaching and Learning

Presentation transcript:

Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical, and Computational Perspectives University of Tübingen, Thomas Hoffmann (University of Regensburg)

1. Introduction: Corpus vs. Introspection We do not need to use intuition in justifying our grammars, and as scientists, we must not use intuition in this way. (Sampson 2001: 135) You don’t take a corpus, you ask questions. […] You can take as many texts as you like, you can take tape recordings, but you’ll never get the answer. (Chomsky in Aarts 2000: 5-6)  Which type of data are we left with then?

1. Introduction: Corpus vs. Introspection A corpus and an introspection-based approach to linguistics […] can be gainfully viewed as being complementary. (McEnery and Wilson 1996: 16)  corpus and introspection data = corroborating evidence  case study: P placement in English Relative clauses

1. Introduction: What to Expect 1.corpora vs. introspection? 2.categorical corpus data (ICE-GB corpus) 3.Magnitude Estimation experiment 4.variable corpus data (ICE-GB corpus) 5.conclusion

2. Corpora and Introspection Arguments against corpus data: “performance” problem: “negative data” problem: “homogeneity” problem:  “only use introspection”

2. Corpora and Introspection Arguments against corpus data:  no corpus “performance” problem: yet:performance result of competence modern corpora representative “negative data” problem: yet:only additional (different) data needed “homogeneity” problem: yet:empirical claim that needs to be investigated  use corpora + additional data type

2. Corpora and Introspection Arguments against introspection data: “unnatural data” problem: “irrefutable data” problem: “illusion” problem: “stability” problem:  “only use corpora”

2. Corpora and Introspection Arguments against introspection data:  no introspection “unnatural data” problem: yet: only additional (context) data needed “irrefutable data”: yet:depends only on collection method “illusion” problem: yet:only additional (natural) data needed “stability” problem: yet:empirical claim that needs to be investigated  use corpora + additional data type

2. Corpora and Introspection Corpora and introspection are corroborating evidence: = weaknesses of corpus data = weaknesses of introspection data +ungrammaticality+unexpected patterns +negative data+contextual factors +rare phenomena+natural language introspectioncorpus

3. Case Study: Preposition Placement I want a data source... (1)a. which I can rely on [stranded preposition] b.on which I can rely [pied-piped preposition] driving question: data source for empirical analysis of (1a,b)?

4. Empirical Study I: Corpus Data Corpus used: International Corpus of English ICE-GB (Nelson et al. 2002) (educated Present-day BE, written & spoken) Analysis tool: GOLDVARB computer programme (logistic regression; Robinson et al. 2001) relative influence of various contextual factors (weights: 0.5 = favouring)

P strand/pied-piped token tested for 1.finiteness 2.restrictiveness 3.relativizer 4.XP contained in (V / N, e.g. entrance to sth. / Adj, e.g. afraid of sth.) 5.level of formality 6.X-PP relationship (V prepositional, PP Loc_Adjunct, PP Man_Adjunct …) except 2: all factors discussed in literature before, but not w.r.t. interdependence (e.g. Bergh, G. & A. Seppänen. 2000; Trotta 2000) 4. Empirical Study I: Corpus Data I

raw ICE-GB P-placement data: 1074 finite relative clauses 659 (61.4%) tokens: pied piped 415 (38.6%) tokens: stranded as expected: many categorical effects  accidental vs. systematic gaps? 4.1 Categorical corpus data

1.relativizer: all that/Ø-tokens in ICE-GB stranded 176 that+P stranded -token (2)  a data source on that I can rely 177 Ø+P stranded -token (3)  a data source on Ø I can rely  ICE-GB result: expected  implications: (2) = (3)? / that  WH- 4.2 Categorical corpus data: that/Ø ≠ WH-relatives

2.X-PP relationship: Literature (e.g. Bergh, G. & A. Seppänen. 2000; Trotta 2000): P stranding favoured with complement PP disfavoured with adjunct PP ICE-GB data: P stranding restricted to PPs which add thematic information to predicates/events 4.3 Categorical corpus data: Constraints on P strand

2.X-PP relationship: categorical effect of WH-PP Adjuncts -tokens: a)just P+WH / no that/Ø+P in ICE-GB: manner, degree, frequency & respect PPs, e.g.: (4)a. the ways in which the satire is achieved b.  the ways which/that/Ø the satire is achieved in 4.3 Categorical corpus data: Constraints on P strand

2.X-PP relationship: categorical effect of WH-PP Adjuncts -tokens: b)just P+WH / but that/Ø+P in ICE-GB: subcat. PP (put sth. in/into/under) & locative, affected loc., direction PP adjuncts (5)a. … the world that I was working in and studying in b. … the world in which I was working and studying 4.3 Categorical corpus data: Constraints on P strand

Claim: comparison of WH- vs that/Ø shows: P can only be stranded if: PP adds thematic information to predicates/events manner & degree adjuncts: compare events “to other possible events of V-ing” (Ernst 2002: 59) frequency & respect adjuncts: have scope over temporal information (frequency) and truth value of entire clause (respect)  don’t add thematic participant  P strand with these: systematic gap 4.3 Categorical corpus data: Constraints on P strand

Claim: comparison of WH- vs that/Ø shows: P can only be stranded if: PP adds thematic information to predicates/events subcat. PP & loc., affected loc., direction PP adjuncts:  add thematic participant  WH+P with these: accidental gap 4.3 Categorical corpus data: Constraints on P strand

Claim: comparison of WH- vs that/Ø shows: P can only be stranded if: PP adds thematic information to predicates/events Comparison of WH- vs that/Ø good evidence, but: still “negative data” problem  further corroborating evidence needed  Introspection: Magnitude Estimation study 4.3 Categorical corpus data: Constraints on P strand

relative judgements (reference sentence) informal, restrictive RCs tested for: P-PLACEMENT(P strand, P pied-piped ) RELATIVIZER (WH-, that-, Ø-) X-PP (V Prep, PP Temp/Loc_Adjunct, PP Manner/Degree_Adjunct ) tokens counterbalanced: 6 material groups a 18 tokens + 36 filler = 54 tokens tokens randomized (Web-Exp-software) N = 36 BE native speakers (sex: 18m, 18f / age: 17-64) 5. Empirical Study II: Magnitude Estimation

18 filler sentences: ungrammatical a.That’s a tape I sent them that done I’ve myself (word order violation; original source: ) b.There was lots of activity that goes on there (subject contact clause; original source: ) c.There are so many people who needs physiotherapy (subject-verb agreement error; original source: ) 5. Empirical Study II: Magnitude Estimation

ANOVA: significant effects P-PLACEMENT: F(1,33) = 4.536, p < 0.05 RELATIVIZER: F(2,66) = , p < P-PLACEMENT*X-PP: F(2,66) = 9.740, p < P-PLACEMENT*RELATIVIZER: F(2,66) = 4.217, p < Empirical Study II: Magnitude Estimation

ANOVA: not significant AGE: F(1,33) = 2.760, p > 0.10 GENDER: F(1,33) = 1.495, p > 0.20  indicates: homogeneity of subjects 5. Empirical Study II: Magnitude Estimation

Post-hoc Tukey test: P-Place*Relativizer P pied-piped : WH- >>that[p >  [p  [p < 0.010] P strand : no difference: WH- = that =  [p >> 0.100] 5. Empirical Study II: Magnitude Estimation

Post-hoc Tukey test: P-Place*X-PP P pied-piped : PP Man/Deg > V Prep [p 0.100] P strand : no difference: V Prep > PP Temp/Loc > PP Man/Deg [p < 0.001] 5. Empirical Study II: Magnitude Estimation

Fig. 1: Magnitude estimation result for P + relativizer P+WH >> P+that > P+Ø

Fig. 2: Magnitude estimation result for P + relativizer compared with fillers P+that & P+Ø = ungrammatical fillers  violation of “hard constraint” (Sorace & Keller 2005)

Fig. 3: Magnitude estimation result for relativizer + P WH + P= that + P = Ø + P V Prep > PP Temp/Loc > PP Man/Deg

Fig. 3: Magnitude estimation result for relativizer + P V Prep > PP Temp/Loc > PP Man/Deg >> ungrammatical filler  violation of “soft constraint” (Sorace & Keller 2005)

6. Corroborating Evidence Corroborating evidence: corpus: man/deg PPs: no P stranded (not even with that/  )  semantic constraint on P stranded experiment: man/deg PPs worst environment for P stranded yet:better than ungrammatical fillers (soft constraint violation)

Constraints on variable corpus data (354 finite WH-token): Goldvarb identified 3 independent factors: (Log likelihood = Significance = 0.004; Fit: X-square(27) = , accepted, p = ) 1. level of formality (as expected) 2.type of PP contained in (as expected) 3.restrictiveness (unexpected): restrictive RC favour pied piping: (weight: 0.592) nonrestrictive RC clearly inhibit pied piping (i.e. favour stranding; weight: 0.248) 7. Empirical Study III: Corpus Data II

(6) And uhm he left me there with this packet of Durex which I hadn't got a clue what to do **[with]** to be totally honest reasons for restrictiveness effect: 1. weaker semantic ties of non-restrictive clause with antecedent (pause/comma) 2. Pied-piped P receives connective function  functionalisation of preposition placement in WH-relative clause 7. Empirical Study III: Corpus Data II

corpus and introspection data = corroborating evidence: corpora: frequency/context effects (e.g. level of formality) unexpected patterns (e.g. restrictiveness) categorical data  require further investigation  introspection: differentiation of accidental gaps (WH+P with PP Temp/Loc ) systematic gaps (X+P with PP Man/Deg ) detection of degrees of ungrammaticality 8. Conclusion

9. References Aarts, B "Corpus linguistics, Chomsky and Fuzzy Tree Fragments". In Christian Mair and Marianne Hundt, eds Corpus Linguistics and Linguistic Theory. Amsterdam and Atlanta, GA: Rodopi, Bard, E.G. et al “Magnitude Estimation of Linguistic acceptability”. Language 72: Bergh, G. & A. Seppänen “Preposition stranding with wh-relatives: A historical survey”. English Language and Linguistics 4: Cowart, W Experimental Syntax: Applying Objective Methods to Sentence Judgements. Thousand Oaks: Sage. Huddleston, R. et al “Relative constructions and unbound dependencies”. In: G.K. Pullum & R. Huddleston, eds. The Cambridge Grammar of the English Language. Cambridge: Cambridge University Press, Jackendoff, R Foundations of Language: Brain, Meaning, Grammar, Evolution. Oxford: Oxford University Press. Levine, R. & I.A. Sag “WH-Nonmovement”.,

9. References Nelson, G. et al Exploring Natural Language: Working with the British Component of the International Corpus of English. Amsterdam, Philadelphia: Benjamins. McEnery, T. and A. Wilson Corpus Linguistics. Edinburgh: Edinburgh University Press. Pesetsky, D “Some principles of sentence production”. In: Pilar Barbosa et al., eds. Is the Best Good Enough? Optimality and Competition in Syntax. Cambridge, MA: MIT Press, Penke, M. & A. Rosenbach "What counts as evidence in linguistics? An introduction". Studies in Language 28,3: Pickering, M. & G. Barry “Sentence processing without empty categories”. Language and Cognitive Processes 6: Quirk, R. et al A Comprehensive Grammar of the English Language. London: Longman. Robinson, J. et al “GOLDVARB 2001: A Multivariate Analysis Application for Windows”.

9. References Sag, I.A “English relative constructions”. Journal of Linguistics 33: Sampson, G Empirical Linguistics. London, New York: Continuum. Schütze, Carson T The Empirical Base of Linguistics: Grammaticality Judgements and Linguistic Methodology. Chicago: Chicago University Press. Sorace, Antonella and Frank Keller "Gradience in linguistic data". Lingua 115,11: Trotta, J Wh-clauses in English: Aspects of Theory and Description. Amsterdam and Philadelphia, GA: Rodopi. Van der Auwera, J “Relative that — a centennial dispute”. Journal of Linguistics 21: