Corpus linguistics for translators Amanda Saksida University of Nova Gorica.

Slides:



Advertisements
Similar presentations
Strategies and Guidelines for Translating Tourism Texts Strategije in napotki za prevajanje turističnih besedil Mentor: Dr. Michelle Gadpaille Co-mentor:
Advertisements

Uses of a Corpus “[E]xplore actual patterns of language use”
Using Corpus Tools in Discourse Analysis Discourse and Pragmatics Week 12.
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
Dr. Radhika Mamidi Corpus. What is a Corpus? a corpus (plural corpora) or text corpus is a large and structured set of texts (now usually electronically.
1 Corpora for all Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
The Sketch Engine -What is The Sketch Engine? -What is a corpus? -Looking at the BASE and the BAWE corpora. -How can this help.
Recent Developments in Technological Tools for the Purpose of Facilitating SLA.
Corpus design & analysis techniques 1.  Monolingual: general, specialized, comparable  Bi/Multilingual: parallel, comparable 2.
What is a corpus?* A corpus is defined in terms of  form  purpose The word corpus is used to describe a collection of examples of language collected.
Corpus Linguistics. What is corpus linguistics? Method / Theory in Linguistics Analysis of collections of texts (corpora) Verifying/ Strengthening or.
Talking about your homework News story? –What made you choose…? One of your words? –What made you choose…? (Give your vocabulary books to another student.
1/26 Corpus Linguistics. 2/26 Varieties of English Relevance of corpus linguistics to this course –Previously studies of stylistics were largely informal.
Data-Driven South Asian Language Learning SALRC Pedagogy Workshop June 8, 2005 J. Scott Payne Penn State University
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
1/23 LELA Lecture 2 Corpus-based research in Linguistics See esp. Meyer pp
Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade 669o4zt
Presented by Jennifer Robison TexTESOL II March 12, 2010 San Antonio, TX.
Corpus Linguistics What can a corpus tell us ? Levels of information range from simple word lists to catalogues of complex grammatical structures and.
Research methods in corpus linguistics Xiaofei Lu.
Chapter 3: An Introduction to Corpus Linguistics Compiled by: Sajjad Ghadamyari Farhad Ghiasvand Presentation Date: Dec. 8, Monday.
English Corpora and Language Learning Tamás Váradi
Memory Strategy – Using Mental Images
CORPUS LINGUISTICS: AN INTRODUCTION Susi Yuliawati, M.Hum. Universitas Padjadjaran
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Online Corpora in L2 Writing Class Zawan Al Bulushi Indiana University Bloomington November 15,
Prof. Karīna Aijmere ( Karin Aijmer ) Gēteborgas Universitāte, Zviedrija „Valodas apguvēju korpuss – tā veidošana un izmantošana valodu apguvē, mācību.
BTANT 129 w5 Introduction to corpus linguistics. BTANT 129 w5 Corpus The old school concept – A collection of texts especially if complete and self-contained:
Claudia Marzi Institute for Computational Linguistics, “Antonio Zampolli” – Italian National Research Council University of Pavia – Dept. of Theoretical.
Translation Studies 8. Research methods in Translation Studies Krisztina Károly, Spring, 2006 Sources: Károly, 2002; Klaudy, 2003.
Representatıvness, balance and samplıng ın a corpus Lınguistıcs.
Researching language with computers Paul Thompson.
©2006 Barry Natusch Tools for Language Researchers Barry Natusch “ Man is a tool-using animal. Without tools he is nothing, with tools he is all. ” - Thomas.
ENG 626 CORPUS APPROACHES TO LANGUAGE STUDIES exploring frequencies in texts Bambang Kaswanti Purwo
FUNDAMENTALS OF LEXICOLOGY
Why We Need Corpora and the Sketch Engine Adam Kilgarriff Lexical Computing Ltd, UK Universities of Leeds and Sussex.
practical aspects1 Translation Tools Translation Memory Systems Text Concordance Tools Useful Websites.
Compiling and Analyzing Your Own Learner Corpus Xiaofei Lu CALPER 2012 Summer Workshop July 16, 2012.
Chapter 10 Language and Computer English Linguistics: An Introduction.
Language Data Resources About Corpora. J. Sinclair: “Language looks rather different when you look at a lot of it at once.“ P. Eisner: “Znáte jej, ten.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Corpora and Concordancers in ESL/EFL Class: Truly Authentic Language for Language Learning. and opening.
Seminar in Applied Corpus Linguistics: Introduction APLNG 597A Xiaofei Lu August 26, 2009.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
LOGISTICS, LOGISTICAL, LOGISTIC: DIACHRONIC AND SYNCHRONIC CORPUS ANALYSIS Dr. Violeta Jurkovič Faculty of Maritime Studies and Transport Portorož.
Elena Tarasheva, PhD New Bulgarian University. Conclusions at last year’s BETA conference.
Corpus approaches to discourse
Corpus Linguistics in Research Doctorate in Education University of Warwick 6th November 2008.
1 Branches of Linguistics. 2 Branches of linguistics Linguists are engaged in a multiplicity of studies, some of which bear little direct relationship.
Pragmatics and Text Analysis Chapter 6.  concerned with the how meaning is communicated by the speaker (writer) and interpreted by the listener (reader)
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.
CORPUS LINGUISTICS 1) A revision of corpus linguistics 2) Language corpora in the ESL/EFL classroom.
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
NLP Midterm Solution #1 bilingual corpora –parallel corpus (document-aligned, sentence-aligned, word-aligned) (4) –comparable corpus (4) Source.
PRIMENJENA LINGVISTIKA I NASTAVA JEZIKA II 3 rd class.
AMANY ALKHAYAT PSCW ENG371 INTRODUCTION TO CORPUS PROCESSING Corpus Processing Ch1.
Corpus Linguistics Anca Dinu February, 2017.
Introduction to Corpus Linguistics
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
Computational and Statistical Methods for Corpus Analysis: Overview
Exploring the BNC Corpus
Corpus Linguistics I ENG 617
عمادة التعلم الإلكتروني والتعليم عن بعد
Corpora and Concordancers in ESL/EFL Class:
Corpus Linguistics I ENG 617
(word formation: follow up)
Using GOLD to Tracking L2 Development
Presentation transcript:

Corpus linguistics for translators Amanda Saksida University of Nova Gorica

... He cast a sídeways look at Harry under his bushy eyebrows. „Be grateful if yeh didn´t mention that ter anyone at Hogwarts,“ he said. „I´m – er – not supposed ter do magic, strictly speakin´.“

... He cast a sídeways look at Harry under his bushy eyebrows. „Be grateful if yeh didn´t mention that ter anyone at Hogwarts,“ he said. „I´m – er – not supposed ter do magic, strictly speakin´.“ Hedwig Harry Hogwarts Hagrid Quidditch...

He cast a sídeways look at Harry under his bushy eyebrows. „Be grateful if yeh didn´t mention that ter anyone at Hogwarts,“ he said. „I´m – er – not supposed ter do magic, strictly speakin´.“ Hedwig Harry Hogwarts Hagrid Quidditch... wart hog = Phacochoerus aethiopicus

... He cast a sídeways look at Harry under his bushy eyebrows. „Be grateful if yeh didn´t mention that ter anyone at Hogwarts,“ he said. „I´m – er – not supposed ter do magic, strictly speakin´.“ Hedwig Harry Hogwarts Hagrid Quidditch... wart hog = Phacochoerus aethiopicus

Course outline Introductory: what is corpora, hystory, typology, online corpora, Areas where corpora are being used, Corpus-based translation studies: interesting examples Tools for building and usage of corpora

What is corpus A corpus is a collection of pieces of language that are selected and ordered according to explicit linguistic criteria in order to be used as a sample of the language. Computer corpusComputer corpus: a corpus which is encoded in a standardised and homogeneous way for open-ended retrieval tasks. Its constituent pieces of language are documented as to their origins and provenance. Guidelines of the Expert Advisory Group on Language Engineering Standards (Guidelines of the Expert Advisory Group on Language Engineering Standards, 1996) Big collections of modern texts Electronic form Representative for language/dialect Base for desctiptive studies (not prescriptive!)

Brief hystory of corpus linguistics 1964: Brown corpus (1 M words) John Sinclair and the Cobuild-Revolution => Bank of English (470 M), British National Corpus (100 M) => Other languages: Czec, Hungarian, Croatian, Slovac, …) Web as corpus: with the digital revolution, more and more texts are available on the net => programs that build corpora using on-line texts (WebBootCat,

Types of corpora Kinds of corpora: Medium: written texts / spoken language Size: referential corpora / specialized corpora Time span: synchronic/diachronic corpora Tagging: lemmatized / POS-tagged corpus Language: mono- or multilingual corpora: paralell comparable translational

Corpus usage Lexicography Descriptive Grammars Translational tools and studies Foreign languages learning Socio-linguistic studies Language technologies

Keywords Concordance KWIC (Keyword in Context) Type / Token Tag / Lemma Collocation

What can a corpus tell us? Word frequency How frequent a word / word form is (copared to other words)? Lexical information Which word frequently coocur? Which affixes can a word have? Syntactical information In which syntactical structures can a word occur? Semantical information What are the possible meanings of a word? Pragmatic information In which texts can we find a word? What stylistic inforamtion does a word or it's context bear? Does the usage of a word stagnate, is the frequency increasing or decreasing?

What can a corpus tell us? Translational studies: Parallel corpus studies can reveal characteristics of translated texts, such as tendencies towards explicitness and avoidance of repetition. Comparison between the translation part of the corpus and a corpus of texts of the same genre, written in the target language for the translation corpus, reveals a tendency towards what we might call the Eliza Doolittle phenomenon: the translated texts, more than the texts in the control corpus, tend to contain those TL phrases, structures, and so on, which, from a comparative point of view, seem particularly characteristic of the TL. (Malmkjaer 1996)

Some of the online corpora British National Corpus Bank of English CORIS FidaPLUS: Good link:

Tools for translating Sentence alignment: TRADOS WinAlign ATRIL DejaVu Vanilla Aligner (unix/linux) Concordances Wordsmith Tools ( Sketch Engine ( MonoConc/ParaConc ( aConCorde - gut für Arabisch ( CQP (ims.uni-stuttgart.de) Manatee / Bonito (

Corpus linguistics in Turkey Kemal Oflazer: Informatics Institute corpus: