IT AND TRANSLATION INTRODUCTION.

Slides:



Advertisements
Similar presentations
© 2000 XTRA Translation Services Is MT technology available today ready to replace human translators?
Advertisements

Machine Translation. Can you imagine working as a translator without the help of computer?
How to Use a Translation Memory Prof. Reima Al-Jarf King Saud University, Riyadh, Saudi Arabia Homepage:
To facilitate communications To support household activities, for personal business, or for education To serve as a productivity/ business tool To assist.
A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven
Stage 5 Prepare the front and back matter.. The five stages of developing a dictionary. 1.Collect words (using semantic domains). 2.Add fields (automated.
Int 1 Revision Word Processing Most people are familiar with word processing packages such as Microsoft Word, Open Office and Word Perfect. Here are some.
1 Session 1 Advantages and Disadvantages of Translation Technology (TT) - Historical development of translation technology - Focus on TM and MT (Theory.
Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.
Computer Assisted Translation CAT Alexander C. Wu Fall 2004.
INFORMATION TECHNOLOGY, THE INTERNET, AND YOU
Introduction to Information Technology v Session : 07 v Source : Shelly, Gary B. Discovering Computers (2004/2005/2006). Thomson Course Technology. Chapter.
MCDST : Supporting Users and Troubleshooting a Microsoft Windows XP Operating System Chapter 5: User Environment and Multiple Languages.
Living in a Digital World Discovering Computers 2011.
1st Project Introduction to HTML.
Chapter 3 Software Two major types of software
Introducing…. Business Problem Are you working as an individual, in a workgroup or with an enterprise having time restraints, limited resources and want.
CGS 1000 Introduction to Computers and Technology.
Professor Michael J. Losacco CIS 1110 – Using Computers Application Software Chapter 3.
An innovative platform to allow translation and indexing of internet sites Localization World
1 Unit 7 Computer-aided Translation. 2 MT and CAT  Human-aided Machine Translation (HAMT)  The machine (the computer) plays the central role in translation.
Basic Application Software © 2013 The McGraw-Hill Companies, Inc. All rights reserved.Computing Essentials 2013.
Chapter 5 Application Software.
1 Chapter 6 Understanding Computers, 11 th Edition Software Ownership Rights Software license: agreement, either included in a software package or displayed.
Section 6.1 Explain the development of operating systems Differentiate between operating systems Section 6.2 Demonstrate knowledge of basic GUI components.
Chapter 3 Application Software.
Basic tasks of generic software Chapter 3. Contents This presentation covers the following: – The basic tasks of standard/generic software including:
© Paradigm Publishing, Inc. 5-1 Chapter 5 Application Software Chapter 5 Application Software.
FLAVIUS Presentation of Softissimo WP1 Project Management.
Unit 1 — Computer Basics Lesson 1 — Understanding Computers and Computer Literacy.
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
Using a Template to Create a Resume and Sharing a Finished Document
FishBase Summary Page about Salmo salar in the standard Language of FishBase (English) ENBI-WP-11: Multilingual Access to European Biodiversity Sites through.
Chapter 4 – Slide 1 Effective Communication for Colleges, 10 th ed., by Brantley & Miller, 2005© Technology and Electronic Communication.
Overview of technologies for translators and language service providers Belinda Maia University of Porto.
Computing Fundamentals Module Lesson 1 — What Is A Computer?
Introduction To Internet
Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.
Microsoft Word ITE115 Trisha Cummings. MsWord - Word Processing Program Allows you to create Letters, Envelopes, Mailing Labels, Memo’s , Fax’s.
Sofia Garcia/Roberto Silva Tutorial Workshop, GrenobleDate: 31/Jan/2007 The work of a professional translator and the translation agency V1.0.
CS 390 Unix Programming Summer Unix Programming - CS 3902 Course Details Online Information Please check.
© Paradigm Publishing Inc. 5-1 Chapter 5 Application Software.
* Property of STI Page 1 of 18 Software: Systems and Applications Basic Computer Concepts Software  Software: can be divided into:  systems software.
CS370 Spring 2007 CS 370 Database Systems Lecture 1 Overview of Database Systems.
Editors And Debugging Systems Other System Software Text Editors Interactive Debugging Systems UNIT 5 S.Sharmili Priyadarsini.
Software. A web site is a collection of web pages on a particular topic. A web page is a document written in HTML code. Web pages are linked together.
Kuliah 4 Pengantar Teknologi Informasi Oleh Coky Fauzi Alfi cokyfauzialfi.wordpress.com Software.
1 Machine Assisted Human Translation (MAHT) (…aka “Translation Memory” or “CAT tool”) …and what it does for the translator…
C OMPUTING E SSENTIALS Timothy J. O’Leary Linda I. O’Leary Presentations by: Fred Bounds.
Discovering Computers Fundamentals, Third Edition CGS 1000 Introduction to Computers and Technology Summer 2007.
Web Design and Development. World Wide Web  World Wide Web (WWW or W3), collection of globally distributed text and multimedia documents and files 
Introduction to the European Union. The European Union Foundation Purpose.
Current Information To help you find current news and information, many search engines and directories include a hyperlink to a "What's new" page. Many.
Learning Objectives Understand the concepts of Information systems.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
1 MIT 5316 Web-Based Computing Lecture 1. 2 Welcome Introduction Syllabus.
IC 3 BASICS, Internet and Computing Core Certification Computing Fundamentals Lesson 1 What Is a Computer?
INFORMATION SOURCES Resources in a library are determined by the information requirements of the users of the Library.
1 January 31, Documenting Software William Cohen NCSU CSC 591W January 31, 2008.
Office 2016 and Windows 10: Essential Concepts and Skills
CS 101 History and Basics.
Chapter 03: Basic Application Software
8. Translation resources
CAT TOOLS.
OPERATE A WORD PROCESSING APPLICATION (BASIC)
Compiler Construction
Use of Electronic and Internet advertising options
Introduction to Machine Translation
Applied Linguistics Chapter Four: Corpus Linguistics
The ultimate in data organization
Presentation transcript:

IT AND TRANSLATION INTRODUCTION

Rationale for IT Applications to Translation “A computer is a device that can be used to magnify human productivity. Properly used, it does not dehumanize by imposing its own Orwellian stamp on the products of human spirit ………. ………..Translation is a fine and exacting art, but there is much about it that is mechanical and routine, if this were given over to a machine, the productivity of the translator would not only be magnified but this work would become more rewarding, more exciting, more human.” Martin Kay (1987)

COURSE OVERVIEW ESSENTIALS TEXT PROCESSING MT TM WORKING WITH CORPORA TERMINOLOGY EXTRACTION AND GLOSSARY PRODUCTION (MONOLINGUAL AND BILINGUAL CORPORA)

COURSE OVERVIEW - DETAILS 1) ESSENTIALS: Types of computer aides CAT vs. MT History of CAT tools General principles of working with CAT tools Reference materials Localization and internationalization UNIX SOME OF THIS TODAY!

COURSE OVERVIEW - DETAILS 2) TEXT PROCESSING Word and WordPad (tips and tricks) Fonts, code pages, keyboard layout, language tools in Windows XP and Office Speech recognition software Scanning OCR File types (essential info on the most common file types and file conversion utilities)

COURSE OVERVIEW - DETAILS 3) MT How it works, brief exhibition: Systran Pro Prompt Neuro Tran Babelfish DESKTOP BASED SUPPORTS CROATIAN (partially Serbian) WEB BASED

COURSE OVERVIEW - DETAILS 4) TM: Overview (what it is, standards and file formats) Desktop vs. server based TM programs WinAlign WordFast Trados (nowadays SDL Trados) – Freelance edition Sisulizer

COURSE OVERVIEW - DETAILS 5) WORKING WITH CORPORA Essentials Concordancing (WordSmith, Concordancer, AntConc) Advanced corpora analysis: WordSmith, TigerSearch Lemmatization and annotation Parallel corpora: ParaConc

COURSE OVERVIEW - DETAILS 6) TERMINOLOGY EXTRACTION AND GLOSSARY PRODUCTION Essentials Doing it automatically: Trados (i.e. SDL) MultiTerm (Desktop and Extract) Doing it semi-automatically: ParaConc, Concordancer

COURSE REQUIREMENTS Basic computer literacy Positive outlook: Computers don’t bite CAT tools are not complex, they are actually made to make you more efficient Interest in translation Willingness to become several times more efficient in doing translations

SCHEDULE HONESTLY, WE DON’T KNOW FOR CERTAIN! THAT’S WHY WE NEED YOUR EMAIL ADDRESSES, SO THAT WE CAN KEEP YOU UPDATED WITH THE LATEST SCHEDULE DEVELOPMENTS PROBABLY: LOCATION: 25 (lectures) and 38 (computer lab), SATURDAYS, at16:00 O’CLOCK

LITERATURE Geoffrey Samuelsson-Brown, A Practical Guide for Translators (Topics in Translation), Multilingual Matters, 4th edition (May 28, 2004) H. L. Somers (Editor), Computers and Translation: A Translator's Guide (Benjamins Translation Library, 35), John Benjamins Publishing Co, 1st edition (May 2003) Bert Esselink, A Practical Guide to Localization (Language International World Directory), John Benjamins Publishing Co, Revised 1st edition (September 2000) Silvia Pavel and Diane Nolet, Handbook of Terminology, Translation Bureau of Canada, 1st edition (2001) Frank Austermuhl, Electronic Tools for Translators (Translation Practices Explained), St. Jerome, 1st edition, (April 2001)

COURSE OVERVIEW - GRADING This is a hands-on course You will be graded on the basis of the results of your practical assignments: Creating TMs from parallel texts (fiction and non-fiction e.g. a book and a manual) – in a way, you will be also creating a parallel corpus Translating two short passages (fiction and non-fiction) using your newly created TMs

ESSENTIALS AND MORE ABOUT THE COURSE IT AND TRANSLATION ESSENTIALS AND MORE ABOUT THE COURSE

TYPES OF COMPUTER AIDES Computer aides / tools that are relevant to translators can be roughly classified into three groups: Basic input and editing tools Reference tools Productivity tools WORD PROCESSORS Electronic books (desktop & web) Electronic dictionaries Web (Eurodicautom, onelook, etc.) Software-based reference materials (encyclopedias, e-Bible, etc.) TM tools MT tools Speech Technology (i.e. voice recognition)

CAT vs MT As soon as you start using computer software in the process of translating, you are entering the realm of COMPUTER-AIDED TRANSLATION, or CAT in short. In other words, CAT is a form of translation wherein a human translator translates texts using computer software designed to support and facilitate the translation process.

CAT vs MT (continued) The problem is that COMPUTER-AIDED TRANSLATION, is sometimes also called COMPUTER-ASSISTED TRANSLATION, MACHINE-AIDED TRANSLATION or MACHINE-ASSISTED TRANSLATION. Due to the latter two terms, CAT is sometimes confused with MACHINE TRANSLATION, or MT in short.

CAT vs MT (continued) Although these two concepts are related and similar in some aspects, CAT and MT denote two diametrically different processes: In CAT, the computer program merely supports the translator, so the translator translates the text himself/herself, making all the essential decisions involved. In MT, the translator supports the machine, that is to say: the computer (i.e. program) translates the text, which is then edited by the translator, or, in most cases, not edited at all.

Translation Technology Continuum CAT vs MT (continued) Graphically represented, the difference is: Translation Technology Continuum automation human involvement Computer-aided Translation (CAT) Automatic Translation/ Machine Translation Unaided Translation Translation process automated by use of Machine Translation Translation process aided by electronic tools such as (most typically) Translation Memory Translation process not aided by any electronic tools Adapted from Hutchins & Somers (1992)

CAT – its scope WRONG!!! CAT is traditionally associated with large-scale / corporate translations: manuals and technical documentation software localization “Typewriter-assisted” (i.e. traditional) translation is usually associated with small-scale / individual translations (done by freelancers): fiction books, scientific papers, etc.

CAT – its scope (continued) This is notion of CAT being restricted to corporate translation projects dates back to the 90s and is based exclusively on financial criteria: during the early and mid 90s a combination of a high-end computer and a high-end CAT tool cost as much as a new car from their very beginnings CAT tools were designed to be capable of handling both big- and small-scale projects, but initially no freelance translator could afford them

CAT – its scope (continued) Even for a freelance translator, CAT route is nowadays the only possibility if one wants to provide high-quality, 100% terminologically consistent and efficiently produced translations. A testimony to that is the industry-standard TM program Trados: Trados Freelance edition has been the company’s best-selling TM program for a number of years.

CAT tools – a bit about their history CAT tools were developed after (very) disappointing initial experiments with MT tools. So, in order to give you a proper overview of how we got where we are now, we have to start with the history of MT tools

MT History – how we switched to CAT MT research began in 1950’s – Warren Weaver’s 1949 Memo: “When I look at an article in Russian, I say: This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.” (in Locke and Booth 1955:18)

MT History – how we switched to CAT Initially based on some misconception about human translation: knowledge of two language systems suffices it is merely a matter of looking up dictionaries it is easy to define “a good translation” there is only one correct translation possible

MT History – how we switched to CAT MT history milestones: pre-ALPAC 1954: Georgetown system demo successful translation of 49 Russian sentences into English 1955-1966: $50m spent in 20 research centres in USA 1966: Automatic Language Processing Advisory Committee (ALPAC) Report concludes: ”...MT is slower, less accurate and twice as expensive as Human Translation...” “...there is no prospect of useful MT either immediately or in the future...”

MT History – how we switched to CAT MT history milestones: post-ALPAC 1969 – privately funded projects Logos system (1969); Weidner-CAT (1977); ALPS (1980) 1975 – Météo project in Canada 1976 – European Commission acquires Systran 1979 – Eurotra project in Europe for Multilingual system 1980 – PC-based system 1990 – data-driven system; WebMT

MT History – how we switched to CAT 1975 Météo project in Canada Automatic translation of weather forecasts (En -> Fr) Sublanguage approach (domain-specific MT) Most successful MT application to date public broadcasting since 1977 Fr -> En available since 1989 only 4% of output needs post-editing rapid translation staff turnover no longer a problem

MT History – how we switched to CAT Renewed interest in MT in late 80s and early 90s: Technological factors specifically: prevalence of PC with improved processing power Translation market factors official bilingualism/multilingualism create institutional needs globalisation creates huge commercial needs Advances in computational linguistics More realistic user expectations Internet creates casual access to multilingual information

MT History – how we switched to CAT However, translations produced by MT were still not reliable and accurate enough for large-scale commercial applications. So, it became evident that the human translator cannot be eliminated and replaced by computers. Actually, it became obvious that computers programs should be used as TOOLS which only HELP the translator.

History of CAT Tools Unreliability of MT tools -> large corporations hire translation agencies Translations agencies find it difficult to cope with the increasing demand Translation agencies develop their own in-house CAT tools Translation agencies begin to sell their CAT tools

History of CAT Tools Two major players in the domain of CAT tools development Trados and STAR Group both started as: TRANSLATION AGENCIES!!! STAR AG was founded as a small translation agency in 1984 by Josef Zibung and Hanspeter Siegrist in the northern Swiss city of Stein am Rhein near Schaffhausen. It won and keept customers from the automotive, machine tool, computer and aeronautics industries like ABB, AT&T, BMW, Dornier, IBM, Mazda, Mercedes, Nissan, Saab and Siemens. TRADOS was founded in 1984 by Jochen Hummel and Iko Knyphausen in Stuttgart, Germany to provide translation services for IBM.

TRADOS – timeline 1990 - first version of TRADOS's main component, MultiTerm was created for DOS 1992 -TRADOS developed the first MultiTerm for Windows (v3.1) 1992 – TRADOS’s Translator's Workbench with linguistic fuzzy-matching on translation memories for DOS 1994 - TRADOS’s Translator's Workbench for Windows

TRADOS – timeline (continued) 1997 – BREAKTHROUGH : Microsoft decides to base its internal localization memory store on TRADOS 1998 – Microsoft acquires a share of 20% in TRADOS TRADOS becomes a de-facto industry standard CAT tool!!! That’s why we will mostly work with TRADOS in this course (as far as TM is concerned). But we will also work with WordFast, because not all people can afford Trados.

WHAT WE WANT TO TEACH YOU HERE? TWO PRACTICAL EXAMPLES OF COMMON TRANSLATION PROBLEMS

IMPORTANT THINGS TO NOTE: (quite obvious) the book has an index = YOU (i.e. the translator) are supposed to make it in the translated version of the book a vast index = a lot of terminology some index terms appear on several pages that are not necessarily in the same chapter (e.g. pg. 36, pg. 92 and pg. 255) = a very serious problem for the consistency of you translation

General principles of working with CAT tools The main goals are EFFICIENCY and CONSISTENCY CAT tools = TM tools (in this case only) The basic idea is fairly simple: Documents, especially technical ones, contain a large amount of content that is similar or identical to information already contained in earlier versions or similar documents that have been translated before. that applies to the source editing language (SL) as well as the target translation languages (TL).

General principles of working with CAT tools So, wouldn’t it be great to re-use previously translated content as valuable reference material for new translations as well so as to obtain consistency of terminology and phrasing? That is exactly what CAT tools do! CAT tools make it possible for translators to work only on content that is being created for the first time. Existing text and text similar to existing text is taken from the available. reference translations (i.e. from TM= translation memory).

General principles of working with CAT tools So, wouldn’t it be great to re-use previously translated content as valuable reference material for new translations as well so as to obtain consistency of terminology and phrasing? That is exactly what CAT tools do! CAT tools make it possible for translators to work only on content that is being created for the first time. Existing text and text similar to existing text is taken from the available. reference translations (i.e. from TM= translation memory).

TRADOS - a screenshot

A DREAM COME TRUE? NOT REALLY  TO ENJOY ALL THE BENEFITS OF CAT TOOLS FIRST YOU HAVE TO CREATE A TM AND A TERMINOLOGY DATABASE: either from your old translations or from new translations (i.e. creating a TM from scratch) A DREAM COME TRUE? THAT IS WHERE OTHER CAT TOOLS (i.e. NON-TM CAT tools) STEP IN TO SAVE THE DAY!!! NOT REALLY 

REUSING YOU OLD TRANSLATIONS The best way to make a TM: reliable source (YOU did the translation) readily available (stored on you PC)

A BRIEF DIGRESSION The term LOCALIZATION has often popped up in previous slides What is LOCALIZATION?

WHAT IS LOCALIZATION? Localization is the process of adapting, translating and customizing a product (software) for a specific market (for a specific locale or cultural conventions; the locale usually determines conventions such as sort order, keyboard layout, date, time, number and currency formats). In terms of software localization, this means the production of interfaces that are meaningful and comprehensible to local users. The Localization Industry Standards Association (LISA) defines localization as: “Localization involves taking a product and making it linguistically and culturally appropriate to the target locale (country/region and language) where it will be used and sold.” Typically, this involves the translation of the user interface (the messages a program presents to users) to enable them to create documents and data, modify them, print them, send them by e-mail, etc.)

LOCALIZATION – what it includes Focal points of internationalization and localization efforts include: Language: Computer-encoded text Alphabets/scripts; different systems of numerals; left-to-right script vs. right-to-left scripts. Most recent systems use the Unicode to solve many of these character encoding problems. Graphical representations of text (printed materials, online images containing text) Spoken (Audio) Sub-titles for video Date/time format, including use of different calendars Formatting of numbers (decimal points, positioning of separators, character used as separator) Time zones (UTC in internationalized environments) Currency Images and colors: issues of comprehensibility and cultural appropriateness Names and titles Government assigned numbers (such as the Social Security number in the US, National Insurance number in the UK) and passports Telephone numbers, addresses and international postal codes Weights and measures Paper sizes Differences between local standards (e.g. YU ISO or JUS) and international standards (ISO)

LOCALIZATION vs. INTERNATIONALIZATION The distinction between internationalization and localization is subtle but important: Internationalization is the adaptation of products for potential use virtually everywhere, while localization is the addition of special features for use in a specific locale. The processes are complementary, and must be combined to lead to the objective of a system that works globally.

CAT tools for localization Over the last couple of years, in addition to general-purpose TM tools such as Trados and Transit, translation technology companies also developed a number of TM tools specially designed for localization: Alchemy CATALYST PASSOLO Sisulizer SISULIZER is currently the industry standard localization tool, so that’s the one in which we will work!!!

SISULIZER – a screenshot

Other CAT tools (non-TM based) As we said earlier, computer-assisted translation (CAT) is a broad and somewhat imprecise term covering a range of tools, from the fairly simple to the more complicated, which can include: Word processors, grammar and spell checkers, terminology managers, eBooks, eDictionaries, full-text search tools, concordancers, web, TM tools, bitexts, etc.

CAT - REFERENCE MATERIALS Reference materials are the primary source of terminology in absence of translation memory. Computer-based reference materials can be classified into: Online libraries Specialized web resources Specialized software products Other materials in electronic formats

Online Libraries Large collections of books in electronic form, e.g. eBrary (new scientific books, pay site) Internet Archive (hosting “A Million Book Project”) Project Gutenberg (PD fiction books, free) Questia (popular titles – fiction and non-fiction, pay site – some sections free)

Internet Archive

eBrary

Questia:

Questia:

Specialized web resources Online glossaries e.g. http://www.lai.com/glossaries.html Online terminology databases e.g. EURODICAUTOM Acronym dictionaries e.g. www.acronymfinder.com Online dictionaries e.g. www.thefreedictionary.com Online corpora (e.g. BNC and COCA)

Online glossary – language automation glossary index

Online terminology databases - EURODICAUTOM

Online terminology databases - EURODICAUTOM

Acronym dictionary – www.acronymfinder.com

Online dictionary – www.thefreedictionary.com

BNC = British National Corpus

BNC = British National Corpus

COCA= Corpus Of Contemporary American English

Specialized software products Various programs that can be used for terminology extraction: Electronic dictionaries General monolingual: e.g. OED v3 Specialized monolingual: e.g. Cambridge Pronouncing Dictionary, Collins Collocations Bilingual: e.g. Morton Benson, MidiDict Electronic Bible (e.g. e-Sword) Concordance programs (e.g. Concordancer) Data-mining programs (e.g. Summarizer Pro)

Electronic dictionaries - OED

Electronic Bible - e-Sword

Concordancers Make it possible to see a word in context: Two types: Useful for finding collocations and phrases Useful for extracting terminology Two types: Monolingual concordancers (e.g. WordSmith) Polylingual concordancers (e.g. ParaConc)

Monolingual Concordancer

Parallel Concordancer

Intellexer Summarizer Pro

Intellexer Summarizer Pro

THE END