University of Bologna, Italy Aston Corpus Symposium 2009

Slides:



Advertisements
Similar presentations
High Resolution studies
Advertisements

Jack Jedwab Association for Canadian Studies September 27 th, 2008 Canadian Post Olympic Survey.
2000 Prentice Hall, Inc. All rights reserved Oggetti const e funzioni membro const 2. Composizione: oggetti come membri di classi 3. Funzioni friend.
2000 Prentice Hall, Inc. All rights reserved. 1 Capitolo 8 – Overloading di operatori 1.Introduzione 2.Fondamenti sulloverloading di operatori 3.Restrizioni.
Chapter 7 System Models.
1 jNIK IT tool for electronic audit papers 17th meeting of the INTOSAI Working Group on IT Audit (WGITA) SAI POLAND (the Supreme Chamber of Control)
Critical Reading Strategies: Overview of Research Process
LIS618 lecture 2 Thomas Krichel Structure Theory: information retrieval performance Practice: more advanced dialog.
Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
PubMed/Filters (Limits) and Advanced Search (module 4.2)
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
SADC Course in Statistics Linking tests to confidence intervals (and other issues) (Session 10)
Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.
For Translators and Translation Editors Note-Taking presents... by Riccardo Schiaffino CTA 3rd Annual Conference Boulder, May © Riccardo Schiaffino,
The 5S numbers game..
Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,
The basics for simulations
Configuration management
Configuration management
1 IMDS Tutorial Integrated Microarray Database System.
User Friendly Price Book Maintenance A Family of Enhancements For iSeries 400 DMAS from Copyright I/O International, 2006, 2007, 2008, 2010 Skip Intro.
Dynamic Access Control the file server, reimagined Presented by Mark on twitter 1 contents copyright 2013 Mark Minasi.
Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
The world leader in serving science TQ ANALYST SOFTWARE Putting your applications on target.
 Copyright I/O International, 2013 Visit us at: A Feature Within from Item Class User Friendly Maintenance  Copyright.
ST-TT Analysis Descriptive-explanatory approaches.
18-Dec-14 Pruning. 2 Exponential growth How many leaves are there in a complete binary tree of depth N? This is easy to demonstrate: Count “going left”
Chapter 11 Creating Framed Layouts Principles of Web Design, 4 th Edition.
Page 1 Orchard Harvest ™ LIS Find a Patient Training.
An Introduction to Perl with Applications in Web Page Scraping.
Unit A4 Translation shifts
Variation and regularities in translation: insights from multiple translation corpora Sara Castagnoli (University of Bologna at Forlì – University of Pisa)
Contrastive Analysis, Error Analysis, Interlanguage
Using Corpus Tools in Discourse Analysis Discourse and Pragmatics Week 12.
What is a corpus?* A corpus is defined in terms of  form  purpose The word corpus is used to describe a collection of examples of language collected.
Guy Aston SSLMIT, University of Bologna The learner as corpus designer.
Search Engines and Information Retrieval
Research methods in corpus linguistics Xiaofei Lu.
RSBM Business School Research in the real world: the users dilemma Dr Gill Green.
Lesson 12 — The Internet and Research
In pursuit of the ‘third code’ Using the ZJU Corpus of Translational Chinese (ZCTC) in Translation Studies Richard Xiao Lianzhen He Ming Yue.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Search Engines and Information Retrieval Chapter 1.
Translation Studies 8. Research methods in Translation Studies Krisztina Károly, Spring, 2006 Sources: Károly, 2002; Klaudy, 2003.
Proposal Type One: Corpus-Based. The following is a list of items typically included in a Type One research proposal for MA in translation studies. The.
Copyright © Allyn & Bacon 2008 Locating and Reviewing Related Literature Chapter 3 This multimedia product and its contents are protected under copyright.
Chapter 3 Copyright © Allyn & Bacon 2008 Locating and Reviewing Related Literature This multimedia product and its contents are protected under copyright.
Exploring Text: Zipf’s Law and Heaps’ Law. (a) (b) (a) Distribution of sorted word frequencies (Zipf’s law) (b) Distribution of size of the vocabulary.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
Translation Studies 9. The use of corpora in TS Krisztina Károly, Spring, 2006 Sources: Olohan, 2004; Tirkkonen-Condit, 2005.
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
Engaging with data Choices and decisions. Seeing or looking at? The advance of corpus linguistics has certainly changed the way that we can look at our.
Mohammad Alipour Islamic Azad University, Ahvaz Branch.
Data Profiling 13 th Meeting Course Name: Business Intelligence Year: 2009.
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
Selecting Relevant Documents Assume: –we already have a corpus of documents defined. –goal is to return a subset of those documents. –Individual documents.
Welcome to All S. Course Code: EL 120 Course Name English Phonetics and Linguistics Lecture 1 Introducing the Course (p.2-8) Unit 1: Introducing Phonetics.
Understanding Standards: Advanced Higher Statistics
Search Engines and Search techniques
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
Text Based Information Retrieval
Lesson 6: Databases and Web Search Engines
Introduction to Corpus Linguistics: Exploring Collocation
Introduction to Corpus Linguistics: Key Word Analysis
Studying translation product and process
Unit 4 Introducing the Study.
Descriptive Translation Studies and Norms Neslihan Kansu-Yetkiner
Using and extending the SPEM specifications to represent agent oriented methodologies Valeria Seidita Valeria Seidita - 3 Dicembre 2007.
The Nature of Learner Language
Lesson 6: Databases and Web Search Engines
Presentation transcript:

University of Bologna, Italy Aston Corpus Symposium 2009 Strategies, norms or universals? Investigating variation in translation Silvia Bernardini University of Bologna, Italy silvia.bernardini@unibo.it Aston Corpus Symposium 2009

Resuming… Last year’s talk:

Theoretical background Target-oriented approach to the study of translation (Toury 1995) Focus on the TT within its context of fruition Identification of norms and laws of translation, e.g. Law of growing standardisation More frequent target language options are preferred Law of interference Source text linguistic features are transferred onto the target text Descriptive rather than prescriptive/pedagogic focus Corpus-based approach to the study of translation (Baker 1993, Olohan 2004)

Theoretical background “the most important task that awaits the application of corpus techniques in translation studies […] is the elucidation of the nature of translated text as a mediated communicative event. In order to do this, it will be necessary to develop tools that will enable us to identify universal features of translation, that is features which typically occur in translated text rather than original utterances and which are not the result of interference from specific linguistic systems”. (Baker 1993: 243)

Theoretical background Tools Monolingual comparable corpora Originals in language A and translations into the same language from 1 or more other languages Universal features (hypothesised) e.g.: explicitness, simplification, disambiguation, preference for conventional grammar, avoidance of repetition, normalisation… Types of observations Lower % of content vs. grammatical words (Laviosa 1998) Fewer contractions (Olohan 2003) Fewer TL-specific “unique items” (Tirkkonen-Condit 2004) …

Summary of old study: corpora 2 small monolingual comparable corpora of fiction text samples One in English (original and translated from It) One in Italian (original and translated from En) 2 small parallel corpora The translations from the corpora above, aligned to their source texts + Reference corpora of English and Italian

Summary of old study: method Collect token frequencies from reference corpora for all candidate collocation types observed in monolingual comparable corpora Rank (MI/Fq) and compare rankings (Mann-Whitney ranks test) For significantly different rankings, analyse translation shifts at parallel level

Summary of old study: findings MCC analysis: Translated fiction texts (Italian and English) tend to be (overall) richer in collocations than original texts in the same language Parallel analysis: Confirms that differences due to translation shifts rather than unrelated variables The data provide support for the law of growing standardisation

Moving on: technical translation Are results re: translation norms and strategies observed in fiction corpora confirmed by analyses of technical translation corpora? i.e., is there (more) evidence of Growing standardisation or Interference In translations compared to (comparable) originals?

Choosing an LSP Perl documentation Practical Extraction and Report Language Popular programming language Most communication happens in English Efforts to produce documentation (original and translated) in Italian Winning more people to the cause

Why perl? Initial stimulus: technical translation course at SSLMIT (1 year of MA) pod2it project Very favourable authentic conditions, near-experimental Neatly delimited topic/discourse community Both originals and translations drafted by area experts (not linguists)

Originals (En) and translations (It) (e.g.) perl pods NAME perlboot - Beginner's Object-Oriented Tutorial DESCRIPTION If you're not familiar with objects from other languages, some of the other Perl object documentation may be a little daunting, such as perlobj, a basic reference in using objects, and perltoot, which introduces readers to the peculiarities of Perl's object system in a tutorial way. NOME perlboot - Introduzione alla tecnologia Orientata agli Oggetti (titolo originale: Beginner's Object-Oriented Tutorial) DESCRIZIONE Se non avete già una certa familiarità con la tecnologia ad oggetti degli altri linguaggi di programmazione, parte della documentazione sulla OOP in Perl potrebbe essere un po‘ intimidatoria: perlobj, una guida di riferimento sull'utilizzo degli oggetti e perltoot che introduce il lettore alle particolarità della tecnologia ad oggetti del Perl con un taglio introduttivo.

Italian originals (e.g.)

Method Corpus design Monolingual component Parallel component Translated Italian texts (PERLTRIT) Original Italian texts (PERLORIT) Parallel component (English Source texts of translated component) (PERLOREN)

Translated Italian (TTs of PERLOREN) Original Italian (comparable) The perl corpus Original English (STs of PERLTRIT) Translated Italian (TTs of PERLOREN) Original Italian (comparable) PERLOREN PERLORIT PERLTRIT tokens 298,346 305,537 321,405 types 18,639 22,495 22,768 texts 43 89 authors translators 16 --- 30 11

Corpus preparation Download texts (plain txt) Record relevant meta-data (readme file) url, author, author’s cv, notes Tag and lemmatise (Tree Tagger) Align parallel component (EasyAlign) Index with the CWB

Assembling evidence Research question Translated fiction texts (Italian and English) show evidence of growing standardisation (at the collocational level) Universal or norm/law-governed? What happens in technical translation? Evidence of standardisation support for the “universality” hypothesis Evidence of interference support for the “norm/law” hypothesis

Assembling evidence Look for differences btwn originals and translations in Italian that: could be interpreted as a consequence of either interference or standardisation are not (likely to be) the result of unrelated variables are sufficiently frequent in this technical field to allow confident judgement ?

Case study: borrowings and calques English words New Italian words based on English terms or new senses derived from English “false friends” English morphosyntactic marks (plural) More frequent in originals or translations?

Case study: borrowings and calques if 1, than translators could be seen as conforming to TL “normal” use more than original authors of comparable texts => standardisation If 2, than translators could be hypothesised to be more subject to interference from the SL than original authors of comparable texts => interference

Identifying foreign/calqued words in corpora Keywords each corpus is used in turn as a reference corpus All words (to identify borrowings) Verbs only (to identify calques) Words ending in –s To compare use of non-Italian morphological marks (unadapted borrowings)

1a Keyword analysis: all words Use one corpus as a reference corpus to highlight words that are significantly more frequent in the other Define what counts as a keyword Cut-off point: 5 Log-likelihood ordering Top 100 types Browse lists, select potential key-borrowings, check concordances

Problems Most “keywords” identify topics that’s what keywords are meant to do after all Some signal differences btwn English/Italian writing strategies or possibly slight genre differences For instance…

PERLORIT PERLTRIT

More borrowings in translated Italian than in original Italian…? PERLTRIT PERLTRIT (cont’d) PERLORIT 178.4 package 65.6 local 131.0 script 148.2 match* 63.7 buffer 130.7 expression 94.6 char 54.9 point 123.2 regular 87.7 filehandle 54.4 record 118.7 array 83.7 locale 53.4 long 75.0 overloading 83.3 require 51.7 pack 54.1 print 72.3 unpack 50.5 thread 50.7 reference 66.9 socket 48.6 Encode 37.5 matching* 66.9 shift 46.5 pipe 34.1 Hello More borrowings in translated Italian than in original Italian…?

Looking closer: PERLTRIT Unrelated variables Larger amount of code text char, filehandle, shift, require, (un)pack Different topics locale, encode, (code) point, long Morphological differences match/matching Dubious cases socket, buffer, record, thread, pipe

Alternatives? Socket Buffer Record Thread Pipe “…anche chiamato zoccolo, è una tipologia di connettore utilizzata in elettronica” Zoccolo: 0 occurrences in corpus Buffer “…letteralmente tampone: in italiano, memoria tampone o anche intermediaria, di transito” Tampone, intermediaria, di transito: 0 occ’s in corpus Record “In informatica il record è un oggetto di un database strutturato in dati che contiene un insieme di campi o elementi, ciascuno dei quali possiede nome e tipo propri.” Thread “Un thread o thread di esecuzione è una suddivisione di un programma in due o più task che vengono eseguiti in modo concorrente.” Pipe “Nei sistemi operativi una pipe è uno degli strumenti disponibili per far comunicare tra loro dei processi. “ Wikipedia

One candidate left: package pacchetto % package + pacchetto PERLTRIT 357 78.8 96 21.1 453 100 PERLORIT 81 84.3 15 15.6 In fact, if anything, translations would seem to show a slight preference for “pacchetto” compared to original texts

Looking closer: originals… PERLORIT 131.0 script 130.7 expression 123.2 regular 118.7 array 75.0 overloading 54.1 print 50.7 reference 37.5 matching* 34.1 Hello

PERLORIT regular expression % espressione regolare reg. expr. + espr. reg. PERLORIT 109 50.9 105 49.0 214 100 PERLTRIT 10 5.9 157 94.0 167 Searches: [word="regular" %cd] [word="expressions?" %cd]; [lem="espressione" %cd] [lem="regolare" %cd];

PERLORIT reference % riferimento reference + riferimento PERLORIT 88 38.2 142 61.7 230 100 PERLTRIT 19 3.9 464 96.0 483 Searches: [word=“references?" %cd]; [lem=“riferimento" %cd];

PERLORIT hello ciao hello+ciao Hello world vs Ciao mondo % PERLORIT 31 69 100 PERLTRIT 1 2.2 43 97.7 44 Hello world vs Ciao mondo Searches: [word=“hello?" %cd]; [word=“ciao" %cd];

Looking closer: originals… PERLORIT 131.0 script 130.7 expression 123.2 regular 118.7 array 75.0 overloading 54.1 print 50.7 reference 37.5 matching* 34.1 Hello

Summing up: 1a borrowings (all) The translated corpus contains more key-borrowings than the original corpus However, in most cases this is due to topic differences In no cases could we identify English words found in the translated corpus with alternative Italian renderings favoured in the original corpus On the other hand, at least 4 out of 8 key-borrowings found in the original corpus have alternative Italian renderings favoured in the translated corpus

1b Calqued verbs Verbs that are significantly more frequent in PERLORIT than in PERLTRIT and viceversa Cut-off point: 2 Log-likelihood ordering Top 100 types Separate searches for: Lemmas that are “unknown” to the tagger To search for real calques Lemmas that are “not unknown” to the tagger To search for existing Italian verbs with calqued meanings

Results PERLORIT PERLTRIT known lemma ritornare fq: 90 LL: 35.9 uccidere fq: 6 LL: 8.3 processare fq: 26 LL: 15.6 unknown lemma cicliamo fq: 2 LL: 3.3 cicla fq: 2 LL: 3.3 splittare fq: 3 LL: 4.9

PERLTRIT: uccidere (un processo) (kill (a process)) PERLTRIT> [lem="uccidere"]; <perlfaq8>: il segnale che ha <ucciso> il processo -->perloren: the signal the process died from <perlfork>: <Uccidere> il processo genitore -->perloren: Killing the parent process <perlfork>: genitore viene <ucciso>(usando la funzione kill( ) -->perloren: process is killed (either using Perl's kill( ) builtin <perlipc>: {HUP} ad 'IGNORE' per evitare di <uccidere> sé stesso) -->perloren: $ SIG{HUP} to IGNORE so it doesn't kill itself) <perlipc>: "fork( )" e "exec( )", ed <uccidere> i processi figli -->perloren: fork( ) and exec( ), and kill the errant child process. <perlthrtut>: probabilmente si bloccherà finché non lo <uccidete>. -->perloren: This program will probably hang until you kill it . kill + inanimate object in ukWaC-01: game (14), process (2), security (2), NHS (2), soul (2), flu (2), time (2), … uccidere + inanimate object in itWaC3-01: musica (5, music), speranza (5, hope), amore (4, love), concorrenza (3, competition), innocenza (3, innocence), percezione (3, perception), realtà (3, reality), …

PERLORIT: ritornare (selected) (return) <corso>: testuale mentre exit <ritorna> solo un codice nume <Dalla_shell_al_web>: ript; <ritornando> poi la struttura re <frameperl>: Tale funzione <ritorna> 0 sei il comando è <frameperl>: exec che però non <ritorna> alcun valore. La <javaperl>: metodo / accept( )/ <ritorna> una istanza della <javaperl>: la funzione <ritornerebbe> un valore vero per <mb_corso_perl_5_print>: slash ( \ ) <ritorna> una reference <mostraLezione.php_puglisi>: iavi e le <ritorna> assemblate <Perl_Tutorial>: ) ; viene <ritornato> vero A dire il vero Perl_Tutorial>: L' espressione $cibo[ 2 ] <ritorna> uva. NB: [lem="ritornare"] [pos="N.*"] Fq PERLTRIT 0 Fq PERLORIT 16 Alternatives: restituire, produrre, … Fq PERLORIT 90 Fq PERLTRIT 28

PERLTRIT: ritornare (selected) (return) <scopo_dello_scope>: il seme, e <ritorna> il risultato <perlboot>: classe per <ritornare> a questo package. <perlembed>: esaminare i valori <ritornati>, avrete <perlfaq>: mai exec( ) non <ritorna>? Si possono fare <perlfaq6>: di matching <ritorna> le coppie che ha tr <perlfaq9>: he gli errori fatali <ritornino> al browser <perlfork>: processo; il figlio <ritorna> dalla fork( ) <perlfunc>: di sistema e non <ritorna>, usate "system" <perlfunc>: ESPR return <Ritorna> da una subroutine , <perlipc>: ostra FIFO. chdir; <ritorna> a casa $FIFO = NB: [lem="ritornare"] [pos="N.*"] Fq PERLTRIT 0 Fq PERLORIT 16 Alternatives: restituire, produrre, … Fq PERLORIT 90 Fq PERLTRIT 28

PERLORIT: processare (selected) (process) <coisson_puntata72>: nga adatta ad essere <processata> dalla shell dei <eb_irc_check>: specificato , verrà <processato> dalla funzione on_l <e_solo_fortuna_printable>: codice viene <processato> con un foglio <introduzione_al_printable>: il software deve <processare> il testo <mb_corso_perl_10_print>: truzioni, essa <processa> tutti gli elementi <mb_corso_perl_10_print>: e di <processarlo> con il seguente cod <mod_perl1tutorial_print>: infatti <processerà> tutte le direttive <Perl_Tutorial>: che crei o comunque <processi> pagine html, sorge <sostituire_ma_c_printable>: il nostro script <processa>, invece di <tegels_usare_il_perl>: file di log viene <processata>. La variabile Fq PERLORIT 26 Fq PERLTRIT 5 Alternatives: elaborare, manipolare…

PERLTRIT: processare (process) PERLTRIT> [lem="processare"]; <perlfaq8>: poiché la shell <processa> le redirezioni <perlfunc>: output vengono <processati> (consultate <perlfunc>: a finire in $var <processa> la lista degl <perlthrtut>: riato affinché venga <processato> . Una <perlvar>: routine per <processare> gli avvertimenti Fq PERLORIT 26 Fq PERLTRIT 5 Alternatives: elaborare, manipolare…

PERLORIT: ciclare (cicle) PERLORIT> [word="cicl.*" & pos="V.*"]; <coisson_puntata71>: inviati); ora <cicliamo> sull' array <garau_guida_perl> consente di <ciclare> un determinato blo <perl_tutorial_sciabarra>: il foreach <cicla> su un array e <sostituire_ma_c_printable>: Perl <cicla> linea per linea e <tegels_usare_il_perl>: aperto, <cicliamo> attraverso le sue Fq PERLORIT 5 Fq PERLTRIT 0 Alternatives: iterare

PERLORIT: splittare split PERLORIT> [word="splitt.*"]; <perl_valsesia>: in cui <splittare> il pattern. <perl_valsesia>: si può voler <splittare> una linea <soltanto_un_alt_printable>: <splittato> e passato <Split_in_perl>: "<splittare>" cioè dividere una str Fq PERLORIT 4 Fq PERLTRIT 0 Alternatives: dividere, separare

Summing up: calques The comparative analysis of key verbs in the original and in the translated subcorpora suggests that authors are more at ease with the use of English (technical) calques than translators.

2. -s words Search for words ending in –s in original Italian and translated Italian (fq >1) Select from output only plurals (unadapted borrowings) used (rather than quoted) in Italian discourse in the two sub corpora Which corpus displays greater use of unadapted borrowings ending in –s?

Words ending in –s Search: [word="[a-zA-Z][a-zA-Z]+-?[a-zA-Z]?s"]; 95 types 711 tokens warnings 69 Windows 56 unless 54 Mongers 38 Associates 31 keys 20 SomeClass 19 files 16 alias 15 Class 14 … 144 types 1000 tokens unless 85 this 60 bless 46 alias 39 exists 38 threads 37 Windows 36 warnings 34 Class 24 vars 24 … PERLORIT PERLTRIT Search: [word="[a-zA-Z][a-zA-Z]+-?[a-zA-Z]?s"];

Results from the PERLIT corpus PERLORIT Word fq files 16 subroutines 10 backquotes 6 scripts 4 forms 4 links 4 expressions 3 cookies 3 references 2 PERLTRIT Word fq backticks 2 closures 1

PERLORIT: “files” perlorit perltrit

PERLORIT: “forms” perlorit perltrit

PERLTRIT: closures and backticks <perlmod>: riguardo alle chiusure [<closures>, N.d.T.]. <perlref>: come le <closures> [ letteralmente " chiusure <perlfaq8>: system( ) con quello dei <backticks> (`). <perlfaq8>: uscita). I <backticks> (``) lanciano il coma <perlfaq8>: shell, con i <backticks> ciò non è possibile.

Summing up: unadapted borrowings Despite superficial quantitative evidence (higher numbers of types and tokens for words ending in –s in translated than in original corpora), translators appear to disfavour unadapted borrowings ending in –s with respect to original authors

General conclusion Results of study 2 lend support to conclusions of study 1: In both fiction translation and technical translation, Despite differences in translator profile, translation “commission”, topic, genre, readership etc., And regardless of differences in methodological design/object of corpus study…

General conclusions The law of growing standardization seems to predominate over the law of interference (in present-day translation practice between English and Italian etc. etc.) Two small steps toward the bottom-up identification of universal trends…

General conclusions The lessons to be learnt Relying on superficial quantitative data in the search for translation universals can be very misleading Insights and hypotheses should emerge from the accumulation of results of (painstaking) analyses conducted on closely comparable corpora, checked against their parallel text component(s) and/or taking into account alternatives offered by the target language

Thank you

References Pym, A. 2008. “On Toury's laws of how translators translate”. In Pym, A., M. Schlesinger and D. Simeoni (eds.). Beyoond Descriptive Translation Studies. Benjamins. 311-328. Toury, G. 1995. Descriptive Translation Studies and Beyond. Amsterdam: Benjamins. Tirkkonen-Condit, S. 2004. “Unique items — over- or under-represented in translated language?”. In Mauranen, A. and P. Kujamäki (eds.), Translation Universals. Benjamins. 177–184. Baker, M. 1993. “Corpus linguistics and translation studies. Implications and applications”. In Baker, M. G. Francis and E. Tognini-Bonelli (eds.). Text and Technology. Benjamins. 233-250. Laviosa, S. 1998. “Core patterns of lexical use in a comparable corpus of English narrative prose”. Meta 43(4). 557-570. Olohan, M. 2003. “How frequent are the contractions? A study of contracted forms in the translational English corpus”, Target 15(1):59-89. Olohan, M. 2004. Introducing Corpora in Translation Studies. Routledge.

Recent critiques “Baker (1995: 235), re-affirmed by Olohan (2004: 43), argues that translations can be studied by comparing them with non-translations in the same language, without focusing on source texts or source languages. This means we can describe translational English in opposition to non-translational English, doing all the research on English. The result is perhaps the major methodological advance associated with corpus studies. It has many economic advantages: it cuts out all the bother of learning foreign languages and cultures; it controls numerous tricky variables associated with suspicions of linguistic and cultural relativism. In the English-only research on optional that, there is thus strictly no way of knowing about any kind of foreign interference causing the frequencies of the linguistic variable, since in principle the source texts are not in the corpus”. [Pym 2008, p. 14 of pre-print version]