A CORPUS LINGUISTIC STUDY OF ELLIPSIS AS A COHESIVE DEVICE Katrin Menzel Institute of Applied Linguistics, Translation and Interpreting, Saarland University.

1 A CORPUS LINGUISTIC STUDY OF ELLIPSIS AS A COHESIVE DEVICE Katrin Menzel Institute of Applied Linguistics, Translation and Interpreting, Saarland University Corpus Linguistics Conference – 27 June 2013, St. Petersburg

2 GECCo project GECCo/Home.html

3 GECCo project GECCo: German-English Contrasts in Cohesion supported by DFG 1st phase nd phase Project Team: Marilisa Amoia Kerstin Kunz Ekaterina Lapshinova Katrin Menzel Erich Steiner

4 Main research questions Which systemic resources of cohesion are instantiated in English and German texts in different registers/genres? How frequent are they? Which cohesive meanings do they express?

5 Research goals  analyse cohesive resources provided by the language systems and instantiations in texts  explore contrasts in form, frequency, function and meaning relations across and between languages, registers and production types

6 Motivation Filling major research gaps: Comprehensive accounts of cohesion: only existent from a monolingual perspective (e.g. Halliday & Hasan 1976) empirical monolingual or contrastive analyses on text and discourse level mainly deal with individual phenomena

7 CORPUS RESOURCES procedures to extract cohesive phenomena require compilation, annotation and exploitation of GECCo Corpus (written and spoken texts) assumption: no clear dividing line but a continuum from written to spoken


9 written part of GECCo is a translation corpus and consists of various genres (popular-scientific, fictional and tourism text, prepared speeches, political essays, corporal communication, instruction manuals, websites) of English and German original texts that are aligned with their translation

10 spoken part of the corpus is comparable corpus of English and German original texts (interviews, academic lectures, web-forum, talkshows…)

11 Corpus resources






17 Types of cohesive devices

18 Present Study: ellipsis types: nominal, verbal, clausal (cf. Halliday&Hasan, 1976) across: - languages: English vs. German - registers: different text types - production types: originals vs. translations

19 Research goals describing ellipsis from a cross-linguistic viewpoint in English and German enhancing corpus linguistic methods to cover a comprehensive variety of ellipses in different registers of spoken and written language in a bilingual corpus (GECCo) of about 1 million words

20 Defining cohesive ellipsis Ellipsis as a cohesive device is the omission of an element normally required by the grammar that can be recovered by the linguistic context. Halliday/Hasan: nominal, verbal, clausal ellipsis Examples: There are two approaches to problem-solving: the empirical [ ] and the rational [ ]. I want to help you, but I can’t [ ]. What is the capital of the Philippines? – Manila [ ].

21 Ellipsis as a cohesive device cohesive ellipsis vs. other types of ellipsis and fragments (e.g. headlines, exophoric ellipsis without textual antecedent, lexicalised ellipsis) missing information must be supplied from the surrounding co-text (usually anaphorically)

22 Some difference between English and German e.g. nominal ellipsis: ellipsis remnant has to show strong morphological agreement in order to license the elided noun in German ein grünes [Haus], keine [Häuser], keins [?] in a few cases, this also happens in English (mine, none…)

23 Verbal ellipsis in English and German lack of correspondence between English and German verbal system  more differences between E/G than with regard to nominal ellipsis e.g. inclusive imperative: Let’s [go]. / Let’s not. (does not exist in German: *Lass uns!) English examples in GECCo: many subtypes of verbal ellipsis with varying degree of complexity German: mainly ellipses of modal verb complement (Er muss [ ])

24 Clausal ellipsis in English and German Differences G/E: case Von wem wurde der Junge untersucht? – (Von) Einer Psychologin. * Eine Psychologin. Who was the boy examined by? – A psychologist. Sluicing: Er will jemandem schmeicheln, aber sie wissen nicht wem [ ] He wants to flatter someone, but they don't know who [ ].

25 Practical Issues Annotating / querying ellipsis in corpora

26 Manual annotation with MMAX2  to compare with automatic annotation Pointer relation can be used to link a bridging expression to its bridging antecedent.

27 CQP queries: to query empty elements we have to find syntactically incomplete or deficient structures German: Stuttgart-Tübingen-TagSet STTS, English: Penn Treebank tagset

28 Querying corpus with CQP (German: Stuttgart-Tübingen-TagSet STTS, in English: Penn Treebank tagset) Sample patterns nominal ellipsisCQP query designExamples English 1. possessive marker 's not followed by noun [# [word='s'][pos!='nn|ne'] Mrs. Wood’s [hat] That was your dream. Kim’s [dreams] were all nightmares. 2. nominal ellipsis after article/det/numeral/quantifier/ possessive marker (+optional adjective) e.g. in German subcorpora: [pos='adja'][pos='vafin']; (adjective + finite verb) or [pos='art'][pos='adja'][pos!='nn|ne']; (article + adjective, not followed by noun/proper noun) in English subcorpora (different tagset) : [pos='jj'][pos='vv.*']; (adjective + verb) I accept the first argument, but reject the other two [ ]/ the third [ ] While Kim had lots of books, Pat had very few [ ]. I went up that skyscraper in Boston, but the tallest [ ] is in Chicago. …

29 Sample CQP queries GO: [pos='adja'][pos='vafin']; (adjective + finite verb); [pos='art'] [pos='adja'][pos!='nn|ne']; (article + adjective, not followed by noun/proper noun) EO/ETRANS (different tagset) : [pos='jj'][pos='vv.*']; (adj. + verb)

30  some manual correction necessary difficulty for tagger: in English, many ellipsis remnants have multiple word class membership

31  pronouns (e.g. "other": det/adj/pron), words ending in -ing: the second being very... - to know whether being is a verb or a noun context has to be taken into account as tagging is sometimes wrong and leads to irrelevant examples in query results)

32 e.g. "one": number/pronoun/det/adj/ noun - sometimes used with nominal ellipsis, sometimes nominal substitution): the green one (= nominal subsitution) we saw one [lion] (=nominal ellipsis)

33 sometimes ellipsis remnants are zero derivations (especially in English this additionally contributes to word class ambiguity for taggers, e.g. N/V: salt, ship, Adj./N: modal)

34 - some nominalised elements (tagged as adj. / numerals), which often refer to people or abstract concepts + lexicalised / context-free ellipsis also have to be sorted out manually: - the immoral, the rich, - the elderly, a 1 year old - the Fantastic Four - the big two [?] (referring to Oxford and Cambridge university, lexicalised?) - lexicalised idiomatic ellipsis: eine [ ] rauchen

35 Nominal ellipsisverbalclausal∑ GO Interview EO Interview GO Academic EO Academic GO Fiction EO Fiction GO Tourism EO Tourism normalized frequencies of typical ellipsis subtypes per words in 4 German & English registers of GECCo

36 Spoken registers EO/GO GECCo: Redundant elements were inserted - instead of elided -, words were repeated, even in an ungrammatical way to remind the hearer of items that were mentioned earlier in the text. - Da machen wir etwas was es absolut verrückt ist. - Ich war bis 1975 war ich in Stuttgart (GO Interview) - For me it’s important is identifying where you come from. (EO Interview)

37 Translation as a cause of linguistic change with regard to cohesive devices

38 cohesive devices, especially ellipsis and substitution, are particular elements where translations involve specific shifts and some kind of ‘fingerprints’ (Gellerstam 2005) or ‘shining through’ (Teich 2003) from the source language into the target language

39 ‘shining through’ (Teich 2003) from source language into target language: empirically identifiable traces of source language interference in terms of proportional frequencies of constructions that have the potential to spread from translated to non-translated target language texts

40 - translation-induced language change is subtle and often overlooked, but in recent years, some interesting studies have demonstrated the significance of translation as a site of language contact (e.g. House 2006) - lexical and orthographic level is probably affected most frequently as words are sometimes borrowed through translation

41 - source language interference with regard to syntactic or discourse-structural patterns, such as the use of cohesive devices, is more complex and less easily perceptible without a quantitative analysis of proportional frequencies in larger text corpora - using translation and parallel text corpora, House (2011) for instance has demonstrated that textual norms in German are adapted to anglophone ones

42 analysis of GECCo corpus indicates that, compared to English originals, English translations of German texts include a higher frequency of nominal ellipsis after adjectives where we would normally expect for example ‘one/s’, ‘of them’, a general or a specific noun: (1) ein Denken …, das strenger ist als das begriffliche [ ] translation: a thinking more rigorous than the conceptual [ ] (2) Der größte und schönste [ ] ist der Naschmarkt. translation: The largest and most impressive [ ] is Naschmarkt.

43 On the other hand, translations into English seem to have a higher frequency of 'one' as a substitute where it is not obligatory (e.g. after 'next', 'second', 'another', 'which').

44 translators often insert ‘tun‘ in the case of English lexical verb ellipsis or use it as a direct translation of ‘do’ If we do not, no one else will [ ]. translation in corpus: Wenn wir es nicht tun, wird niemand es tun. just as Ukraine and South Africa had done and as Libya is doing today translation: so wie es die Ukraine und Südafrika getan haben, und wie Libyen es heute tut

45 corpus extraction results show that number of hits of lemma ‘tun’ is much higher in German translations (41 / words) than in German originals (29 / words)

46 translations contribute to semantic bleaching of this verb (writers of German original texts usually tend to avoid the verb ‘tun’ as a substitute for a main verb for stylistic reasons)

47 depending on various factors such as standardization of the language and genre and amount and prestige of translated texts, language specific structures and innovations may spread from translated to non-translated target language texts

48 References: Evert, S The CQP Query Language Tutorial. IMS, Universität Stuttgart. Gellerstam, M Fingerprints in Translation, In: In and out of English: For Better, for Worse, ed. by G. Anderman and M. Rogers, Clevedon: Multilingual Matters, pp Halliday, Michael. A.K. and Ruqaiya Hasan Cohesion in English. London: Longman. House, J Covert Translation, Language Contact, Variation and Change. In: SYNAPS House, J Using translation and parallel text corpora to investigate the influence of Global English on text norms in other languages. In: A. Kruger et al eds. Corpus-based Translation Studies. London: continuum.

49 Kunz, K. & Lapshinova-Koltunski, E. 2011, Tools to Analyse German- English Contrasts in Cohesion. In proceedings of GSCL-2011, Hamburg, Germany. Neumann, S. & S. Hansen-Schirra. The CroCo Project. Cross- linguistic corpora for the investigation of explicitation in translations. In Proceedings from the Corpus Linguistics Conference Series (PCLC), Vol. 1 no. 1, Steiner, E Empirical studies of translations as a mode of language contact - “explicitness” of lexicogrammatical encoding as a relevant dimension. In: Siemund, P. & N. Kintana (eds.). Language contact and contact languages. Amsterdam: John Benjamins (Hamburg Studies in Multilingualism Vol. 7). pp Teich, Elke Cross-Linguistic Variation in System and Text: A Methodology for the Investigation of Translations and Comparable Texts. Berlin: Mouton de Gruyter.

50 Спасибо за внимание! У вас есть вопросы? Do you have any questions? Comments? Katrin Menzel 50

