Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of Bologna, Italy Aston Corpus Symposium 2009

Similar presentations


Presentation on theme: "University of Bologna, Italy Aston Corpus Symposium 2009"— Presentation transcript:

1 University of Bologna, Italy Aston Corpus Symposium 2009
Strategies, norms or universals? Investigating variation in translation Silvia Bernardini University of Bologna, Italy Aston Corpus Symposium 2009

2 Resuming… Last year’s talk:

3 Theoretical background
Target-oriented approach to the study of translation (Toury 1995) Focus on the TT within its context of fruition Identification of norms and laws of translation, e.g. Law of growing standardisation More frequent target language options are preferred Law of interference Source text linguistic features are transferred onto the target text Descriptive rather than prescriptive/pedagogic focus Corpus-based approach to the study of translation (Baker 1993, Olohan 2004)

4 Theoretical background
“the most important task that awaits the application of corpus techniques in translation studies […] is the elucidation of the nature of translated text as a mediated communicative event. In order to do this, it will be necessary to develop tools that will enable us to identify universal features of translation, that is features which typically occur in translated text rather than original utterances and which are not the result of interference from specific linguistic systems”. (Baker 1993: 243)

5 Theoretical background
Tools Monolingual comparable corpora Originals in language A and translations into the same language from 1 or more other languages Universal features (hypothesised) e.g.: explicitness, simplification, disambiguation, preference for conventional grammar, avoidance of repetition, normalisation… Types of observations Lower % of content vs. grammatical words (Laviosa 1998) Fewer contractions (Olohan 2003) Fewer TL-specific “unique items” (Tirkkonen-Condit 2004)

6 Summary of old study: corpora
2 small monolingual comparable corpora of fiction text samples One in English (original and translated from It) One in Italian (original and translated from En) 2 small parallel corpora The translations from the corpora above, aligned to their source texts + Reference corpora of English and Italian

7 Summary of old study: method
Collect token frequencies from reference corpora for all candidate collocation types observed in monolingual comparable corpora Rank (MI/Fq) and compare rankings (Mann-Whitney ranks test) For significantly different rankings, analyse translation shifts at parallel level

8 Summary of old study: findings
MCC analysis: Translated fiction texts (Italian and English) tend to be (overall) richer in collocations than original texts in the same language Parallel analysis: Confirms that differences due to translation shifts rather than unrelated variables The data provide support for the law of growing standardisation

9 Moving on: technical translation
Are results re: translation norms and strategies observed in fiction corpora confirmed by analyses of technical translation corpora? i.e., is there (more) evidence of Growing standardisation or Interference In translations compared to (comparable) originals?

10 Choosing an LSP Perl documentation
Practical Extraction and Report Language Popular programming language Most communication happens in English Efforts to produce documentation (original and translated) in Italian Winning more people to the cause

11 Why perl? Initial stimulus: technical translation course at SSLMIT (1 year of MA) pod2it project Very favourable authentic conditions, near-experimental Neatly delimited topic/discourse community Both originals and translations drafted by area experts (not linguists)

12 Originals (En) and translations (It) (e.g.) perl pods
NAME perlboot - Beginner's Object-Oriented Tutorial DESCRIPTION If you're not familiar with objects from other languages, some of the other Perl object documentation may be a little daunting, such as perlobj, a basic reference in using objects, and perltoot, which introduces readers to the peculiarities of Perl's object system in a tutorial way. NOME perlboot - Introduzione alla tecnologia Orientata agli Oggetti (titolo originale: Beginner's Object-Oriented Tutorial) DESCRIZIONE Se non avete già una certa familiarità con la tecnologia ad oggetti degli altri linguaggi di programmazione, parte della documentazione sulla OOP in Perl potrebbe essere un po‘ intimidatoria: perlobj, una guida di riferimento sull'utilizzo degli oggetti e perltoot che introduce il lettore alle particolarità della tecnologia ad oggetti del Perl con un taglio introduttivo.

13 Italian originals (e.g.)

14 Method Corpus design Monolingual component Parallel component
Translated Italian texts (PERLTRIT) Original Italian texts (PERLORIT) Parallel component (English Source texts of translated component) (PERLOREN)

15 Translated Italian (TTs of PERLOREN) Original Italian (comparable)
The perl corpus Original English (STs of PERLTRIT) Translated Italian (TTs of PERLOREN) Original Italian (comparable) PERLOREN PERLORIT PERLTRIT tokens 298,346 305,537 321,405 types 18,639 22,495 22,768 texts 43 89 authors translators 16 --- 30 11

16 Corpus preparation Download texts (plain txt)
Record relevant meta-data (readme file) url, author, author’s cv, notes Tag and lemmatise (Tree Tagger) Align parallel component (EasyAlign) Index with the CWB

17 Assembling evidence Research question
Translated fiction texts (Italian and English) show evidence of growing standardisation (at the collocational level) Universal or norm/law-governed? What happens in technical translation? Evidence of standardisation support for the “universality” hypothesis Evidence of interference support for the “norm/law” hypothesis

18 Assembling evidence Look for differences btwn originals and translations in Italian that: could be interpreted as a consequence of either interference or standardisation are not (likely to be) the result of unrelated variables are sufficiently frequent in this technical field to allow confident judgement ?

19 Case study: borrowings and calques
English words New Italian words based on English terms or new senses derived from English “false friends” English morphosyntactic marks (plural) More frequent in originals or translations?

20 Case study: borrowings and calques
if 1, than translators could be seen as conforming to TL “normal” use more than original authors of comparable texts => standardisation If 2, than translators could be hypothesised to be more subject to interference from the SL than original authors of comparable texts => interference

21 Identifying foreign/calqued words in corpora
Keywords each corpus is used in turn as a reference corpus All words (to identify borrowings) Verbs only (to identify calques) Words ending in –s To compare use of non-Italian morphological marks (unadapted borrowings)

22 1a Keyword analysis: all words
Use one corpus as a reference corpus to highlight words that are significantly more frequent in the other Define what counts as a keyword Cut-off point: 5 Log-likelihood ordering Top 100 types Browse lists, select potential key-borrowings, check concordances

23 Problems Most “keywords” identify topics
that’s what keywords are meant to do after all Some signal differences btwn English/Italian writing strategies or possibly slight genre differences For instance…

24 PERLORIT PERLTRIT

25 More borrowings in translated Italian than in original Italian…?
PERLTRIT PERLTRIT (cont’d) PERLORIT 178.4 package 65.6 local 131.0 script 148.2 match* 63.7 buffer 130.7 expression 94.6 char 54.9 point 123.2 regular 87.7 filehandle 54.4 record 118.7 array 83.7 locale 53.4 long 75.0 overloading 83.3 require 51.7 pack 54.1 print 72.3 unpack 50.5 thread 50.7 reference 66.9 socket 48.6 Encode 37.5 matching* 66.9 shift 46.5 pipe 34.1 Hello More borrowings in translated Italian than in original Italian…?

26 Looking closer: PERLTRIT
Unrelated variables Larger amount of code text char, filehandle, shift, require, (un)pack Different topics locale, encode, (code) point, long Morphological differences match/matching Dubious cases socket, buffer, record, thread, pipe

27 Alternatives? Socket Buffer Record Thread Pipe
“…anche chiamato zoccolo, è una tipologia di connettore utilizzata in elettronica” Zoccolo: 0 occurrences in corpus Buffer “…letteralmente tampone: in italiano, memoria tampone o anche intermediaria, di transito” Tampone, intermediaria, di transito: 0 occ’s in corpus Record “In informatica il record è un oggetto di un database strutturato in dati che contiene un insieme di campi o elementi, ciascuno dei quali possiede nome e tipo propri.” Thread “Un thread o thread di esecuzione è una suddivisione di un programma in due o più task che vengono eseguiti in modo concorrente.” Pipe “Nei sistemi operativi una pipe è uno degli strumenti disponibili per far comunicare tra loro dei processi. “ Wikipedia

28 One candidate left: package pacchetto % package + pacchetto PERLTRIT
357 78.8 96 21.1 453 100 PERLORIT 81 84.3 15 15.6 In fact, if anything, translations would seem to show a slight preference for “pacchetto” compared to original texts

29 Looking closer: originals…
PERLORIT 131.0 script 130.7 expression 123.2 regular 118.7 array 75.0 overloading 54.1 print 50.7 reference 37.5 matching* 34.1 Hello

30 PERLORIT regular expression % espressione regolare
reg. expr. + espr. reg. PERLORIT 109 50.9 105 49.0 214 100 PERLTRIT 10 5.9 157 94.0 167 Searches: [word="regular" %cd] [word="expressions?" %cd]; [lem="espressione" %cd] [lem="regolare" %cd];

31 PERLORIT reference % riferimento reference + riferimento PERLORIT 88
38.2 142 61.7 230 100 PERLTRIT 19 3.9 464 96.0 483 Searches: [word=“references?" %cd]; [lem=“riferimento" %cd];

32 PERLORIT hello ciao hello+ciao Hello world vs Ciao mondo % PERLORIT 31
69 100 PERLTRIT 1 2.2 43 97.7 44 Hello world vs Ciao mondo Searches: [word=“hello?" %cd]; [word=“ciao" %cd];

33 Looking closer: originals…
PERLORIT 131.0 script 130.7 expression 123.2 regular 118.7 array 75.0 overloading 54.1 print 50.7 reference 37.5 matching* 34.1 Hello

34 Summing up: 1a borrowings (all)
The translated corpus contains more key-borrowings than the original corpus However, in most cases this is due to topic differences In no cases could we identify English words found in the translated corpus with alternative Italian renderings favoured in the original corpus On the other hand, at least 4 out of 8 key-borrowings found in the original corpus have alternative Italian renderings favoured in the translated corpus

35 1b Calqued verbs Verbs that are significantly more frequent in PERLORIT than in PERLTRIT and viceversa Cut-off point: 2 Log-likelihood ordering Top 100 types Separate searches for: Lemmas that are “unknown” to the tagger To search for real calques Lemmas that are “not unknown” to the tagger To search for existing Italian verbs with calqued meanings

36 Results PERLORIT PERLTRIT known lemma ritornare fq: 90 LL: 35.9
uccidere fq: 6 LL: 8.3 processare fq: 26 LL: 15.6 unknown lemma cicliamo fq: 2 LL: 3.3 cicla fq: 2 LL: 3.3 splittare fq: 3 LL: 4.9

37 PERLTRIT: uccidere (un processo) (kill (a process))
PERLTRIT> [lem="uccidere"]; <perlfaq8>: il segnale che ha <ucciso> il processo -->perloren: the signal the process died from <perlfork>: <Uccidere> il processo genitore -->perloren: Killing the parent process <perlfork>: genitore viene <ucciso>(usando la funzione kill( ) -->perloren: process is killed (either using Perl's kill( ) builtin <perlipc>: {HUP} ad 'IGNORE' per evitare di <uccidere> sé stesso) -->perloren: $ SIG{HUP} to IGNORE so it doesn't kill itself) <perlipc>: "fork( )" e "exec( )", ed <uccidere> i processi figli -->perloren: fork( ) and exec( ), and kill the errant child process. <perlthrtut>: probabilmente si bloccherà finché non lo <uccidete>. -->perloren: This program will probably hang until you kill it . kill + inanimate object in ukWaC-01: game (14), process (2), security (2), NHS (2), soul (2), flu (2), time (2), … uccidere + inanimate object in itWaC3-01: musica (5, music), speranza (5, hope), amore (4, love), concorrenza (3, competition), innocenza (3, innocence), percezione (3, perception), realtà (3, reality), …

38 PERLORIT: ritornare (selected) (return)
<corso>: testuale mentre exit <ritorna> solo un codice nume <Dalla_shell_al_web>: ript; <ritornando> poi la struttura re <frameperl>: Tale funzione <ritorna> 0 sei il comando è <frameperl>: exec che però non <ritorna> alcun valore. La <javaperl>: metodo / accept( )/ <ritorna> una istanza della <javaperl>: la funzione <ritornerebbe> un valore vero per <mb_corso_perl_5_print>: slash ( \ ) <ritorna> una reference <mostraLezione.php_puglisi>: iavi e le <ritorna> assemblate <Perl_Tutorial>: ) ; viene <ritornato> vero A dire il vero Perl_Tutorial>: L' espressione $cibo[ 2 ] <ritorna> uva. NB: [lem="ritornare"] [pos="N.*"] Fq PERLTRIT 0 Fq PERLORIT 16 Alternatives: restituire, produrre, … Fq PERLORIT 90 Fq PERLTRIT 28

39 PERLTRIT: ritornare (selected) (return)
<scopo_dello_scope>: il seme, e <ritorna> il risultato <perlboot>: classe per <ritornare> a questo package. <perlembed>: esaminare i valori <ritornati>, avrete <perlfaq>: mai exec( ) non <ritorna>? Si possono fare <perlfaq6>: di matching <ritorna> le coppie che ha tr <perlfaq9>: he gli errori fatali <ritornino> al browser <perlfork>: processo; il figlio <ritorna> dalla fork( ) <perlfunc>: di sistema e non <ritorna>, usate "system" <perlfunc>: ESPR return <Ritorna> da una subroutine , <perlipc>: ostra FIFO. chdir; <ritorna> a casa $FIFO = NB: [lem="ritornare"] [pos="N.*"] Fq PERLTRIT 0 Fq PERLORIT 16 Alternatives: restituire, produrre, … Fq PERLORIT 90 Fq PERLTRIT 28

40 PERLORIT: processare (selected) (process)
<coisson_puntata72>: nga adatta ad essere <processata> dalla shell dei <eb_irc_check>: specificato , verrà <processato> dalla funzione on_l <e_solo_fortuna_printable>: codice viene <processato> con un foglio <introduzione_al_printable>: il software deve <processare> il testo <mb_corso_perl_10_print>: truzioni, essa <processa> tutti gli elementi <mb_corso_perl_10_print>: e di <processarlo> con il seguente cod <mod_perl1tutorial_print>: infatti <processerà> tutte le direttive <Perl_Tutorial>: che crei o comunque <processi> pagine html, sorge <sostituire_ma_c_printable>: il nostro script <processa>, invece di <tegels_usare_il_perl>: file di log viene <processata>. La variabile Fq PERLORIT 26 Fq PERLTRIT 5 Alternatives: elaborare, manipolare…

41 PERLTRIT: processare (process)
PERLTRIT> [lem="processare"]; <perlfaq8>: poiché la shell <processa> le redirezioni <perlfunc>: output vengono <processati> (consultate <perlfunc>: a finire in $var <processa> la lista degl <perlthrtut>: riato affinché venga <processato> . Una <perlvar>: routine per <processare> gli avvertimenti Fq PERLORIT 26 Fq PERLTRIT 5 Alternatives: elaborare, manipolare…

42 PERLORIT: ciclare (cicle)
PERLORIT> [word="cicl.*" & pos="V.*"]; <coisson_puntata71>: inviati); ora <cicliamo> sull' array <garau_guida_perl> consente di <ciclare> un determinato blo <perl_tutorial_sciabarra>: il foreach <cicla> su un array e <sostituire_ma_c_printable>: Perl <cicla> linea per linea e <tegels_usare_il_perl>: aperto, <cicliamo> attraverso le sue Fq PERLORIT 5 Fq PERLTRIT 0 Alternatives: iterare

43 PERLORIT: splittare split
PERLORIT> [word="splitt.*"]; <perl_valsesia>: in cui <splittare> il pattern. <perl_valsesia>: si può voler <splittare> una linea <soltanto_un_alt_printable>: <splittato> e passato <Split_in_perl>: "<splittare>" cioè dividere una str Fq PERLORIT 4 Fq PERLTRIT 0 Alternatives: dividere, separare

44 Summing up: calques The comparative analysis of key verbs in the original and in the translated subcorpora suggests that authors are more at ease with the use of English (technical) calques than translators.

45 2. -s words Search for words ending in –s in original Italian and translated Italian (fq >1) Select from output only plurals (unadapted borrowings) used (rather than quoted) in Italian discourse in the two sub corpora Which corpus displays greater use of unadapted borrowings ending in –s?

46 Words ending in –s Search: [word="[a-zA-Z][a-zA-Z]+-?[a-zA-Z]?s"];
95 types 711 tokens warnings 69 Windows 56 unless 54 Mongers 38 Associates 31 keys 20 SomeClass 19 files 16 alias 15 Class 14 144 types 1000 tokens unless 85 this 60 bless 46 alias 39 exists 38 threads 37 Windows 36 warnings 34 Class 24 vars 24 PERLORIT PERLTRIT Search: [word="[a-zA-Z][a-zA-Z]+-?[a-zA-Z]?s"];

47 Results from the PERLIT corpus
PERLORIT Word fq files 16 subroutines 10 backquotes 6 scripts 4 forms 4 links 4 expressions 3 cookies 3 references 2 PERLTRIT Word fq backticks 2 closures 1

48 PERLORIT: “files” perlorit perltrit

49 PERLORIT: “forms” perlorit perltrit

50 PERLTRIT: closures and backticks
<perlmod>: riguardo alle chiusure [<closures>, N.d.T.]. <perlref>: come le <closures> [ letteralmente " chiusure <perlfaq8>: system( ) con quello dei <backticks> (`). <perlfaq8>: uscita). I <backticks> (``) lanciano il coma <perlfaq8>: shell, con i <backticks> ciò non è possibile.

51 Summing up: unadapted borrowings
Despite superficial quantitative evidence (higher numbers of types and tokens for words ending in –s in translated than in original corpora), translators appear to disfavour unadapted borrowings ending in –s with respect to original authors

52 General conclusion Results of study 2 lend support to conclusions of study 1: In both fiction translation and technical translation, Despite differences in translator profile, translation “commission”, topic, genre, readership etc., And regardless of differences in methodological design/object of corpus study…

53 General conclusions The law of growing standardization seems to predominate over the law of interference (in present-day translation practice between English and Italian etc. etc.) Two small steps toward the bottom-up identification of universal trends…

54 General conclusions The lessons to be learnt
Relying on superficial quantitative data in the search for translation universals can be very misleading Insights and hypotheses should emerge from the accumulation of results of (painstaking) analyses conducted on closely comparable corpora, checked against their parallel text component(s) and/or taking into account alternatives offered by the target language

55 Thank you

56 References Pym, A “On Toury's laws of how translators translate”. In Pym, A., M. Schlesinger and D. Simeoni (eds.). Beyoond Descriptive Translation Studies. Benjamins Toury, G Descriptive Translation Studies and Beyond. Amsterdam: Benjamins. Tirkkonen-Condit, S “Unique items — over- or under-represented in translated language?”. In Mauranen, A. and P. Kujamäki (eds.), Translation Universals. Benjamins. 177–184. Baker, M “Corpus linguistics and translation studies. Implications and applications”. In Baker, M. G. Francis and E. Tognini-Bonelli (eds.). Text and Technology. Benjamins Laviosa, S “Core patterns of lexical use in a comparable corpus of English narrative prose”. Meta 43(4) Olohan, M “How frequent are the contractions? A study of contracted forms in the translational English corpus”, Target 15(1):59-89. Olohan, M Introducing Corpora in Translation Studies. Routledge.

57 Recent critiques “Baker (1995: 235), re-affirmed by Olohan (2004: 43), argues that translations can be studied by comparing them with non-translations in the same language, without focusing on source texts or source languages. This means we can describe translational English in opposition to non-translational English, doing all the research on English. The result is perhaps the major methodological advance associated with corpus studies. It has many economic advantages: it cuts out all the bother of learning foreign languages and cultures; it controls numerous tricky variables associated with suspicions of linguistic and cultural relativism. In the English-only research on optional that, there is thus strictly no way of knowing about any kind of foreign interference causing the frequencies of the linguistic variable, since in principle the source texts are not in the corpus”. [Pym 2008, p. 14 of pre-print version]


Download ppt "University of Bologna, Italy Aston Corpus Symposium 2009"

Similar presentations


Ads by Google