Presentation is loading. Please wait.

Presentation is loading. Please wait.

Corpus linguistics for translators Amanda Saksida University of Nova Gorica.

Similar presentations


Presentation on theme: "Corpus linguistics for translators Amanda Saksida University of Nova Gorica."— Presentation transcript:

1 Corpus linguistics for translators Amanda Saksida University of Nova Gorica

2 ... He cast a sídeways look at Harry under his bushy eyebrows. „Be grateful if yeh didn´t mention that ter anyone at Hogwarts,“ he said. „I´m – er – not supposed ter do magic, strictly speakin´.“

3 ... He cast a sídeways look at Harry under his bushy eyebrows. „Be grateful if yeh didn´t mention that ter anyone at Hogwarts,“ he said. „I´m – er – not supposed ter do magic, strictly speakin´.“ Hedwig Harry Hogwarts Hagrid Quidditch...

4 He cast a sídeways look at Harry under his bushy eyebrows. „Be grateful if yeh didn´t mention that ter anyone at Hogwarts,“ he said. „I´m – er – not supposed ter do magic, strictly speakin´.“ Hedwig Harry Hogwarts Hagrid Quidditch... wart hog = Phacochoerus aethiopicus

5 ... He cast a sídeways look at Harry under his bushy eyebrows. „Be grateful if yeh didn´t mention that ter anyone at Hogwarts,“ he said. „I´m – er – not supposed ter do magic, strictly speakin´.“ Hedwig Harry Hogwarts Hagrid Quidditch... wart hog = Phacochoerus aethiopicus

6

7 Course outline Introductory: what is corpora, hystory, typology, online corpora, Areas where corpora are being used, Corpus-based translation studies: interesting examples Tools for building and usage of corpora

8 What is corpus A corpus is a collection of pieces of language that are selected and ordered according to explicit linguistic criteria in order to be used as a sample of the language. Computer corpusComputer corpus: a corpus which is encoded in a standardised and homogeneous way for open-ended retrieval tasks. Its constituent pieces of language are documented as to their origins and provenance. Guidelines of the Expert Advisory Group on Language Engineering Standards (Guidelines of the Expert Advisory Group on Language Engineering Standards, 1996) Big collections of modern texts Electronic form Representative for language/dialect Base for desctiptive studies (not prescriptive!)

9 Brief hystory of corpus linguistics 1964: Brown corpus (1 M words) John Sinclair and the Cobuild-Revolution => Bank of English (470 M), British National Corpus (100 M) => Other languages: Czec, Hungarian, Croatian, Slovac, …) Web as corpus: with the digital revolution, more and more texts are available on the net => programs that build corpora using on-line texts (WebBootCat, http://www.sketchengine.co.uk/auth/wbc/mycorp.cgi) http://www.sketchengine.co.uk/auth/wbc/mycorp.cgi

10 Types of corpora Kinds of corpora: Medium: written texts / spoken language Size: referential corpora / specialized corpora Time span: synchronic/diachronic corpora Tagging: lemmatized / POS-tagged corpus Language: mono- or multilingual corpora: paralell comparable translational

11 Corpus usage Lexicography Descriptive Grammars Translational tools and studies Foreign languages learning Socio-linguistic studies Language technologies

12 Keywords Concordance KWIC (Keyword in Context) Type / Token Tag / Lemma Collocation

13 What can a corpus tell us? Word frequency How frequent a word / word form is (copared to other words)? Lexical information Which word frequently coocur? Which affixes can a word have? Syntactical information In which syntactical structures can a word occur? Semantical information What are the possible meanings of a word? Pragmatic information In which texts can we find a word? What stylistic inforamtion does a word or it's context bear? Does the usage of a word stagnate, is the frequency increasing or decreasing?

14 What can a corpus tell us? Translational studies: Parallel corpus studies can reveal characteristics of translated texts, such as tendencies towards explicitness and avoidance of repetition. Comparison between the translation part of the corpus and a corpus of texts of the same genre, written in the target language for the translation corpus, reveals a tendency towards what we might call the Eliza Doolittle phenomenon: the translated texts, more than the texts in the control corpus, tend to contain those TL phrases, structures, and so on, which, from a comparative point of view, seem particularly characteristic of the TL. (Malmkjaer 1996)

15 Some of the online corpora British National Corpus http://www.natcorp.ox.ac.uk/ http://view.byu.edu Bank of English http://www.collins.co.uk/Corpus/CorpusSearch.aspx CORIS http://corpus.cilta.unibo.it:8080/DEMOCORISCorpQuery.html FidaPLUS: www.fidaplus.net Good link: http://devoted.to/corpora

16 Tools for translating Sentence alignment: TRADOS WinAlign ATRIL DejaVu Vanilla Aligner (unix/linux) Concordances Wordsmith Tools (www.lexically.net)www.lexically.net Sketch Engine (http://www.sketchengine.co.uk) MonoConc/ParaConc (www.athel.com) aConCorde - gut für Arabisch (http://www.comp.leeds.ac.uk/andyr/software/aConCorde/) CQP (ims.uni-stuttgart.de) Manatee / Bonito (www.textforge.cz)www.textforge.cz

17 Corpus linguistics in Turkey Kemal Oflazer: http://www.andrew.cmu.edu/user/ko/ http://www.andrew.cmu.edu/user/ko/ Informatics Institute corpus: http://www.ii.metu.edu.tr/~corpus/ http://www.ii.metu.edu.tr/~corpus/


Download ppt "Corpus linguistics for translators Amanda Saksida University of Nova Gorica."

Similar presentations


Ads by Google