Corpus-based Translation Studies: Theoretical Introduction 1.

1 Corpus-based Translation Studies: Theoretical Introduction 1

2  The term corpus linguistics used first in the 1980s (Leech)  Language corpora BC: methodology used before computers in the pre-Chomskyan period (Boas, Sapir, Newman, Bloomfield, Pike (Biber & Finegan 1991) or Otto Jespersen (Svartvik)  Severely criticized for ‘skewedness’ and marginalized in the Chomskyan era  Nowadays a mainstream methodology used in many branches of linguistics; its introduction to language studies compared to the invention of telescopes in astronomy (Stubbs) 2

3  „A large collection of authentic texts that have been gathered in electronic form according to a specific set of criteria (Bowker and Pearson 2002:9)  Machine-readable representative collection of naturally occurring language assembled for the purpose of linguistic analysis 3

4  Not a homogenous methodology but has some shared features:  the analysis is based on a corpus of naturally-occurring language which is machine-readable so that the retrieval of the search patterns is computerized;  the corpus is intended to be balanced and representative of the modality/register/variety it researches;  the analysis is systematic and exhaustive, i.e. it is not just a database of examples from which some can be chosen ad libitum and others neglected but that the whole corpus is taken into consideration (Gries 2006: 4) 4

5 1. „The observer must not influence what is observed”. 2. “Repeated events are significant”. Corpora show what is “central and typical, normal and expected”; they emphasise that language use is highly patterned and that such patterns are not accidental but cognitively motivated (Stubbs 2004: 111). 5

6  CL promotes the theory of “meaning as use” as developed by Wittgenstein, Austin and Firth (Stubbs 2004: 110).  Corpora take linguistics “beyond the single word as the basic semantic unit” (Teubert 2002: 212), posing an important theoretical question as to the minimum linguistic unit. Reorientation from single words to recurrent semantic conglomerates  Taylor: it might be more appropriate to use the term ‘mental phrasicon’ rather than ‘mental lexicon’ since the linguistic description of words must include constructions they occur in (2006: 575).  Beaugrande: corpora ‚mediate against the decontextualising’ (1996: 527) 6

7  reduced speculation and subjectivity (counting „is the least contestable mode of analysis” Holmes);  authenticity of data;  the potential to verify research hypotheses systematically and based on more extensive linguistic material;  Less error-prone: „computers provide consistent, reliable analyses; they do not change their minds or become tired” (Biber and Jones) 7

8  Problems with representativeness and balance:  Any claims and generalisations we make about language are representative of the language sample we research, not of the entire language  Stubbs: „any corpus is a compromise between the desirable and the feasible” (2004: 113)  Limited availability of corpora, esp. for minority languages  More difficult to apply in the case of inflectional languages  Copyright issues  Limited functionality of parallel corpus software 8

9  It gives priority to observation over intuition (Stubbs 2004: 107-8).  It is classified as an empirical approach to the description of language use which analyses authentic data and is inductive/data-driven in that it formulates theoretical statements from observations of actual use (Tognini-Bonelli 2001: 2).  It is mainly a quantitative method but it also integrates qualitativeness to hypothesise about data provided by the corpus and to form generalisations about language use (quantitative-driven qualitativeness); in this sense it is complementary with an intuition-based approach (McEnery et al. 2006: 7), or with what Fillmore calls armchair linguistics (Stubbs 2004: 108).  In contrast to computational linguistics, corpus linguistics is not so much interested in frequencies (how often) per se, but in tendencies to explain why certain regularities occur (Beaugrande 1996: 514). 9

10 Controversy as to CL status:  The ‘orthodox’ strand - the corpus-driven (Tognini-Bonelli 2001) / neo- Firthian tradition — ‘corpus as theory’ (Hardie and McEnery 2010), sees corpus linguistics as an independent discipline (Hardie and McEnery 2010: 384–386) with ‘a theoretical status’ (Tognini-Bonelli 2001: 1).  The other more dominant strand referred to as the corpus-based (Tognini- Bonelli 2001) / methodologist tradition — ‘corpus as method’ (Hardie and McEnery 2010), regards corpus linguistics as a set of systematic methods and principles, i.e. methodology, applied in various branches of linguistics and using various theoretical frameworks to explain corpus data (Hardie and McEnery 2010: 384–386). As emphasised by Johansson, corpus linguistics “is not defined by the object of study... the object of corpus linguistics is not the study of corpora. It is rather the study of language through corpora” (qtd. in Kenny 2001: 23). 10

11  In which areas of language studies may corpora be applied?  Corpus linguistics versus computational linguistics 11

12  How are corpora used in Translation Studies? 12

13  Developed in mid 90-ties; pioneered by Baker who proposed to analyse translations against non-translated texts and identify the distinctive features of translated texts (1996: 176) ;  Consequence of the polysystem theory and Descriptive Translation Studies  They mark a shift from the analysis of the ST-TT relation (i.e. equivalence, accuracy) to TTs as independent texts on their own, emphasizing the importance of translated texts in receiving cultures CBTS (Hunston 2002):  Theoretical approach  translation process, TT against TL  Practical approach  corpora used in translator training as a reference tool and software development (machine translation and CATs) 13

14  hybrid language (Trosborg 1996 and 1997; Schäffner & Adab 2001); third code (Frawley 1984), third language (Duff) ; translationese (Baker); translanguage (Al Khafaji 2007)  claim that translated language differs from the non-translated language  ST INTERFERENCES: The translationese is the language of translation (target text) usually understood pejoratively as defined by Doherty as “translation-based deviations from target language conventions” (qtd. in Olohan 2004: 29); also confirmed by Olohan “translated language that appears to be influenced by the source language, usually in an inappropriate way or to an undue extent” (2004: 90). In addition to being seen as a deviation from the norm as a result of SL interferences (infringement of TL naturalness), it is also comprehended as an intermediary between the source text/language and the target text/language. 14

15  Baker: translation as a “a mediated communicative event” (1993: 243) “shaped by its own goals, pressures and context of production” (1996: 176);  as a result “the language of translation may be characterized by specific, identifiable features that may be related to the nature of the translation activity itself” (Olohan 2004: 90).  Baker refers to such features as “translation universals”. They stem from the idea that the translation process is affected by more or less universal influences, which are more subtle than SL interferences  they are not reflected “in odd forms with regard to TL of the non-existing type (i.e. deviations from the code proper), but […] in odd forms of the unusual type, which are deviations from the norm of usage” (Toury 1979: 226). 15

16  simplification: (the idea that translators subconsciously simplify the language or message or both)  explicitation (the tendency to spell things out in translation, including, in its simplest form, the practice of adding background information)  normalisation or conservatism (the tendency to conform to patterns and practices which are typical of the target language, even to the point of exaggerating them)  levelling out (… the tendency of translated text to gravitate around the centre of any continuum rather than move towards the fringes; …[it] means that we can expect to find less variation among individual texts in a translation corpus than among those in a corpus of original texts; […] translated texts seem to be less idiosyncractic, or more similar to each other, than original texts. (Baker 1996: 176-7).  Over and under-representation of source or targe language elements 16

17  The very term ‘universal’. As emphasised by Chesterman, “Genuine universals are the subject of unrestricted hypotheses: these claims aim to be valid for all translations of all kinds, in all times and places, universally” (2004: 9), which makes them difficult, if not impossible, to prove. The term ‘universal’ seems too radical to many researchers who tend to equate it with absolute universals contrary to what is generally accepted in the Greenberg tradition of research on language universals, which are treated as general tendencies not necessarily shared by all languages (Mauranen 2007: 34–35). In contrast to Baker’s strong form of the hypotheses, in their weaker version translation universals are treated as tendencies (Hatim and Munday 2004: 8).  Imprecise definitions of universals and lack of conceptual clarity; for example, Pym finds universals as theoretically ‘nebulous’ (2010: 81), Chesterman points to the imprecise definition of unique items (2007), while Becher criticises a number of explicitation studies for failure to define explicitation or apply it consistently (2010: 7). 17

18  Anglocentrism of research on universals. Most supporting evidence comes from English and closely related European languages; to prove that a feature is universal evidence from ‘genetically’ distant non-European languages is vital (Xiao 2010: 7, Mauranen 2007: 45).  Baker’s claim that universals arise solely from the constraints of translation process and are not affected by interference, culture and norms of translation. Most studies which provide supportive evidence for the existence of translation universals are conducted solely with comparable corpora, which precludes the assessment of ST interference and other factors. Some researchers suggest that ‘universal’ features may well be a result of norms (Malmkjær 2007: 57, see also Kenny 2001: 53). Halverson adds that any investigations into translation universals may not disregard the language pair since translated language is shaped by asymmetries in SL and TL semantic networks (2003: 224). House argues that they are language universals applicable to translation rather than autonomous translation universals (2008: 11). Her counterarguments include evidence that features of translated texts may differ depending on the language pair, directionality of translation and genre; for example, a higher degree of explicitation was found in German translations of popular science texts than in economic texts (2008: 12). 18

19  House: “the quest for translation universals is in essence futile, i.e. that there are no, and there can be no, translation universals” (2008: 11).  Chesterman, who is sceptical as to the very possibility of proving the hypotheses, takes an opposite view on its usefulness: “What ultimately matters is perhaps not the universals, which we can never finally confirm anyway, but new knowledge of the patterns, and patterns of patterns, which help us to make sense of what we are looking at” (2004: 11).  Toury is not so much interested in the existence of universals ‘in the world’ but in their ‘explanatory power’, i.e. ability to provide probabilistic explanations of translation practices (2004: 29).  Another offspring is a revival of linguistic methods in translation studies, which were shelved during the cultural turn. 19

20 20

21  defined by Baker as “an overall tendency to spell things out rather than leave them implicit in translation”, which may be manifested at various levels, e.g. longer length of translations, optional that in reported speech which may be found more frequently in translations than in original texts, overuse of explanatory vocabulary and conjunctions (1996: ).  The rising explicitness in translationese has been noted by Blum- Kulka (1986: 21), who also noted that “the process of interpretation performed by the translator on the source text might lead to a TL text which is more redundant than the SL text” (1986: 19). According to Toury (1991b:51) explicitation appears in “all kinds of mediated events, including interaction in a foreign language” (qtd. in Baker 1993: 244).  spółka jawna  registered partnership 21

22  Simplification is the translator’s inclination “to simplify the language used in translation” (Baker 1996: 181). Also referred to as disambiguation (Baker 1993:244).  For example, it may be found in the split of long sentences into smaller ones or clarifying use of punctuation.  Baker: simplification is in a way connected with explicitation: “simplification involves making things easier for the reader (but not necessarily more explicit), but it does tend to involve also selecting an interpretation and blocking other interpretations, and in this sense it raises the level of explicitness by resolving ambiguity” (1996: 182).  Lower lexical density in translated texts (Laviosa), i.e. more grammatical words at the expense of lexical words make text processing easier  Type-token ratio (ratio of the range of vocabulary – how varied the vocabulary is): less varied vocabulary 22

23  Disambiguation versus strategic ambiguity & purposeful flexibility of legal language (vagueness)  Northcott & Brown 2006: 362 “One problem for the translator can be in misunderstanding the deliberate intention to retain ambiguity which can lead to an attempt to make the term more precise and limit possible interpretations by the court. Translators have no authority to resolve ambiguities in source texts. However, this can be brought about inadvertently if translators do not have sufficient legal and linguistic expertise.” 23

24  This Consultant Agreement is made, executed and entered into this 17th day of October, 2007 at West Village, Calfornia, by and between...  You agree that you will not during or after your employment disclose or make use of or exploit any confidential information.  At all times throughout the course of your employment  For and on behalf of the Company 24

25  P rotokół z przebiegu posiedzenia Rady Nadzorczej  Nieruchomości mogą być zbywane w drodze publicznej licytacji  Wynagrodzenie to powinno być wypłacone nie później niż w terminie roku od dnia dokonania przekształcenia lub dnia powzięcia uchwały  Realizację wyżej wymienionych zasad zapewnia stosowanie przez PKO BP zaawansowanych metod zarządzania ryzykiem kredytowym, zarówno na poziomie pojedynczych ekspozycji kredytowych, jak i na poziomie całego portfela kredytowego Banku. 25

26  EN legal aid and advice  PL prawo do przedstawiciela z urzędu i bezpłatnego doradztwa prawnego (Europejska konwencja o wykonywaniu praw dzieci, Strasburg, 25 stycznia (słownik DGT))  Proof of power of attorney  dokument potwierdzający udzielenie pełnomocnictwa / ustanowienie pełnomocnika (*zaświadczenie o nadaniu pełnomocnictwa)  We reserve a right  *rezerwujemy sobie prawo 26

27  Relation between TT and non-translated target language  How translations „fit” nontranslated language  A measure of linguistic distance: overrepresentation and underrepresentation of lexicogrammatical patterns in translations  Normalisation  Levelling out  Unique items  Untypical collocations 27

28  “ a tendency to exaggerate features of the target language and to conform to its typical patterns” (Baker 1996: 183).  interpreters tend to round off unfinished sentences and remove grammatical errors, or false starts (Shlesigner 1991: 150); avoidance of experiments in punctuation use (Baker 1996: 184).  day and say appear more frequently in translated English (from Arabic) than in non-translated English (Baker 1993: 244).  Kenny’s sanitation: sanitised version of the original  Resembles Toury’s law of growing standardisation 28

29  “the tendency of translated text to gravitate towards the centre of a continuum. […] It involves steering a middle course between any two extremes, converging towards the centre” (Baker 1996: 184). For example, lexical density, type- token ratio and mean sentence length are similar in translated texts while in non-translated ones they have higher variance (ibidem: 184). 29

30  Distributions of language items in translations are different from their distributions in spontaneous text in the same language (Mauranen 2002):  translations favour combinations which are possible in the target language system, but rare or absent from actual texts  translations often have few or no instances of combinations that are frequent in target language texts  Underrepresentation of unique items: inspired by Reiss’s observation of ‘missing words’ in translated language due to translators’ failure to exploit the full potential of the TL linguistic resources. Unique items are TL linguistic features without straightforward counterparts in the SL; they may be lexical, phraseological, syntactic or textual, and “they do not readily suggest themselves as translation equivalents, as there is no obvious linguistic stimulus for them in the source text”. If unique items are underrepresented, translations are perceived as ‘less normal’ (Tirkkonen-Condit 2002, 2004).  Halverson (2003): gravitational pull: overrepresentation of high-frequency lexical items; underrepresentation of non-shared elements 30

31  Any unique items that come to your mind for the EN>PL and PL>EN direction of translation? 31

32 1. cognitive processing that produces translation: causes should be sought in human cognition 2. translation as a communicative act and the translator’s awareness of his socio-cultural role as mediator of messages for new readers (cf. Klaudy)  we seek generalisations; “what ultimately matters is perhaps not the universals, which we can never finally confirm anyway, but new knowledge of the patterns, which help us to make sense of what we are looking at.” (Chesterman 2004: 11). 32

