Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using a parallel corpus in translation practice and research Ana Frankenberg-Garcia

Similar presentations

Presentation on theme: "Using a parallel corpus in translation practice and research Ana Frankenberg-Garcia"— Presentation transcript:

1 Using a parallel corpus in translation practice and research Ana Frankenberg-Garcia

2 Machine Translation Using machines to analyse Human Translation

3 The study of human translation  Traditionally not a hard science  Difficult to be systematic But with the technology of corpus linguistics, things can change …

4 What is a corpus? large specific criteria text-retrieval software machine-readable

5 Advantages of using corpora to study human translation  An enormous amount of translated texts  Systematic analyses  Quantifiable results

6 A bi-directional parallel corpus of Portuguese and English COMPARA Project leaders Ana Frankenberg-Garcia & Diana Santos Research assistants Rosário Silva & Susana Inácio Initial support (1999-2000) FCT (Portugal) ISLA (Lisboa) Oxford University (Language Centre) Present funding (2001-2006) Linguateca: FCT/ POSI (POSI/PLP/43931/2001)

7 PT source texts EN source texts COMPARA structure EN translations PT translations COMPARA

8 English Portuguese Original Translated Portuguese Portuguese Original Translated English Source Translations Texts

9 COMPARA 8.0 varieties Portugal Brazil Angola Mozambique UK US South Africa PORTUGUESE ENGLISH Unbalanced distribution!

10 COMPARA 8.0 Publication dates 1837 2002 1880 1997 1988 1914

11 COMPARA 8.0 genre Published fiction other genres EXTENSIBLE

12 COMPARA 8.0 authors Portuguese writers Camilo Castelo Branco Eça de Queirós José Cardoso Pires José Saramago Jorge de Sena Lídia Jorge Mário de Carvalho Sá Carneiro

13 COMPARA 8.0 authors Brazilian writers Aluísio Azevedo Autran Dourado Chico Buarque Jô Soares José de Alencar Machado de Assis Manuel Antônio de Almeida Marcos Rey Patrícia Melo Paulo Coelho Rubem Fonseca

14 COMPARA 8.0 authors Angolan writers José Eduardo Agualusa Mozambiquean writers Mia Couto

15 COMPARA 8.0 authors British writers David Lodge Ian McEwan Julian Barnes Joseph Conrad Joanna Trollope Kazuo Ishiguro Lewis Carrol Mary Shelley Oscar Wilde

16 COMPARA 8.0 authors American writers Henry James Edgar Allan Poe Richard Zimler South African writers Nadine Gordimer

17 Can any text be included in the corpus?  Only published source texts and translations  Only English translated directly from Portuguese Portuguese translated directly from English  Only human translations!

18 71 source texts (extracts) 74 translations COMPARA 8.0 texts

19 COMPARA 8.0 size 1,536,269 1,423,937 words in in English Portuguese Largest edited parallel corpus containing Portuguese

20 COMPARA users and uses  Language learners - bilingual dictionary with examples  Language teachers - exercises and tests  Translators - language equivalents  Translation lecturers - exercises & problems  Translation theorists - test translation hypotheses  Lexicographers - bilingual dictionaries  Computational linguists - machine translation Latest statistics: + 6000 queries per month

21 COMPARA availability Free, online For research and education



24 “nodded”



27 Studies using COMPARA 1.Observing source texts and translations 2.Constrasting Portuguese and English 3.Comparing translated and untranslated language 4.Examining the characteristics of translated texts

28 1. Observing source texts & translations Improving bilingual dictionaries and machine-translation programs Frankenberg-Garcia (2002) nod Ribeiro & Dias (2005) grande Specia et al. (2005) word-sense disambiguation

29 2. Contrasting English and Portuguese Contrasting original fiction in English and Portuguese Frankenberg-Garcia (2005) PT Loan words EN Loan words PT Loan languages EN Loan languages

30 3. Comparing translated and untranslated language diferente(s) simplesmente end.* up translations source texts * 30,715,4 15,6 5,1 13,5 2,8 * frequency/100 K words in COMPARA 7.0.4 2 x 3 x 4 x lemma “rezar” 5,612,4 2 x

31 4. Examining the characteristics of translated texts Are translations longer than source texts? Frankenberg-Garcia (2004) Explicitation Hypothesis

32 Pt 1500 words Pt 1500 words Pt 1500 words Pt 1500 words Pt 1500 words Pt 1500 words Pt 1500 words Pt 1500 words En 1500 words En 1500 words En 1500 words En 1500 words En 1500 words En 1500 words En 1500 words En 1500 words ? Source texts Translations 8 PT authors 8 EN authors 8 PT translators 8 EN translators

33 ST TT + 5% Matched t-test: 95% probability TT longer than ST Source texts Translations

34 Studies such as these were unthinkable before corpora Many other studies are possible! COMPARA is free and available online Contact us: To conclude....

Download ppt "Using a parallel corpus in translation practice and research Ana Frankenberg-Garcia"

Similar presentations

Ads by Google