Localization industry Word Count Standard Andrzej Zydroń CTO XTM Intl

Slides:



Advertisements
Similar presentations
OCLC Online Computer Library Center OCLC Cataloging Update Connexion client 1.50 & more OCLC CJK Users Group Annual Meeting San Francisco, CA April 8,
Advertisements

Worldwide typography (and how to apply JIS-X to Unicode) Michel Suignard Microsoft Corporation.
MA foundation Creating webpages using XHTML (part 1) Simon Mahony CCH
Translation Editor Import and Export Support Fieldworks Technical Workshop CTC 2006 Tom Bogle.
Chapter 2 HTML Basics Key Concepts
HTML/XML XHTML Authoring. Creating Tables  Table: An arrangement of horizontal rows and vertical columns. The intersection of a row and a column is called.
Cascading Style Sheets. CSS stands for Cascading Style Sheets and is a simple styling language which allows attaching style to HTML elements. CSS is a.
The COUNTER Code of Practice for Books and Reference Works Peter Shepherd Project Director COUNTER UKSG E-Books Seminar, 9 November 2005.
1Data Structures | Data Elements Creating a Data Structure If you do not find a clinical assessment defined in the NDAR Data Dictionary, send us your codebook.
L10N Standards Warszawa 2014
Solutions for Multilingual Literature by XSL Formatter 6,800 known languages.
Binary Expression Numbers & Text CS 105 Binary Representation At the fundamental hardware level, a modern computer can only distinguish between two values,
Web Page Development Identify elements of a Web Page Start Notepad
Media: Text “Words and symbols in any form, spoken or written, are the most common system of communication.” ~ unknown.
INVITATIONS L.O: To know what the structure is for an invitation. To understand the properties of an invitation are and what punctuation is appropriate.
ISO Standards: Status, Tools, Implementations, and Training Standards/David Danko.
Basics of HTML Shashanka Rao. Learning Objectives 1. HTML Overview 2. Head, Body, Title and Meta Elements 3.Heading, Paragraph Elements and Special Characters.
Introduction to Human Language Technologies Tomaž Erjavec Karl-Franzens-Universität Graz Tomaž Erjavec Lecture: Character sets
San José, CA – September, 2004 Localizing with XLIFF and ICU Markus Scherer Raghuram (Ram) Viswanadha IBM San.
Localizing OpenClinica Hiroaki Honshuku: SQA 1. © What is Character Encoding?  Morse Code (1840) → Latin Alphabet  ASCII (1963)  The American Standard.
Globalisation & Computer Systems week 5 1. Localisation presentations 2.Character representation and UNICODE UNICODE design principles UNICODE character.
IBM Maximo Asset Management © 2007 IBM Corporation Tivoli Technical Exchange Calls Aug 31, Maximo - Multi-Language Capabilities Ritsuko Beuchert.
APPX Unicode Support APPX Release 6.0 will support Unicode APPX will support languages worldwide.
Spring /6.831 User Interface Design and Implementation1 Lecture 22: Internationalization.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text.
1.  Describe the anatomy of a web page  Format the body of a web page with block-level elements including headings, paragraphs, lists, and blockquotes.
Chapter 2 HTML Basics Key Concepts Copyright © 2013 Terry Ann Morris, Ed.D 1.
IBM Globalization Center of Competency © 2006 IBM Corporation IUC 29, Burlingame, CAMarch 2006 Automatic Character Set Recognition Eric Mader, IBM Andy.
C-Language Keywords(C99)
Bing Hong OSIsoft Internationalization &
Excel Ch 6 Review.
Highlights from recent MARC changes Sally McCallum Library of Congress.
Open Your Mind to Open Source MPDO’s & EOPR’s Centre for IT & eGovernance AMR-APARD Hyderabad Welcome!
Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.
CSU - DCE Introduction to XML XML Core Concepts - Fort Collins, CO Copyright © XTR Systems, LLC XML Core Concepts or Some Gory Details Instructor:
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
The eXtensible Markup Language (XML). Presentation Outline Part 1: The basics of creating an XML document Part 2: Developing constraints for a well formed.
Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005.
ELanguages creative collaboration for teachers globally.
Xml:tm XML Text Memory Using XML technology to reduce the cost of translating XML documents.
Chapter 5 Working with Multiple Worksheets and Workbooks
Find International Driving Document Translator Online
Basics of Unicode (base upon a presentation by NRSI, SIL International)
ENRICH Kick Off Meeting 1 High level objectives ● Is there a common conceptual model for ms description? ● If so, we can provide a TEI P5-conformant representation.
ELanguages creative collaboration for teachers globally.
ELanguages creative collaboration for teachers globally.
Data Encoding COSC 1301.
Information Retrieval in Practice
Binary Representation in Text
Binary Representation in Text
Preparation for End of Key Stage 1 Testing 2017
HTML TEXT.
Automatic Calculation of Translator Productivity Improvement
From Baan and ERP Ln to Excel, Word, PDF, HTML, XML & Notepad in one click! B2Win 7.
N100 Building a Simple Web Page
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
7/19/2018 Data, and Metrics, and Reports! Oh, my!: Just Follow the Yellow Brick Road Presented at the CSM Symposium 2016 Andrew Brubaker strategic_data_gathering_and_dashboarding-3.pptx.
Workshop on XML-Based Library Applications 5
Link Resolver and Knowledge Base in Discovery Services
Formatting Paragraphs
Computer Symbols
An overview of Java, Data types and variables
Our Targets Maths I know my multiplication tables from 2 x to12 x including the inverse and use them to solve mathematical problems. I know my 7, 8, 9.
RDA in a non-MARC environment
Automatic Calculation of Translator Productivity Improvement
LO1 – Understand Computer Hardware
Adobe Acrobat DC Accessibility Non-Text Elements
Adobe Acrobat DC Accessibility Data Tables
Use Cases Simple Machine Translation (using Rainbow)
creative collaboration for teachers globally
Presentation transcript:

Localization industry Word Count Standard Andrzej Zydroń CTO XTM Intl GMX-V Localization industry Word Count Standard Thank you Peter, initially the keynote speaker was going to be Yves Champollion. Unfortunately Yves was taken seriously ill last month. My most sincere sympathy goes out to Yves and his family. I wish him a speedy and full recovery. Yves has contributed much to the development of our industry. Andrzej Zydroń CTO XTM Intl ASLING TC#39, London 2017

GMX Global Information Management Metrics eXchange Tripartite GMX-V : Volume, published (2.0) GMX-C : Complexity (not started) GMX-Q : Quality (not started) Standard for defining a L10N job Allows for quantifying job complexity

Why GMX-V

GMX-V GIM Metrics eXchange – Volume Objectives: Two types of count: Unambiguous and verifiable definition of word and character counts A method of exchanging counts within an XML framework Two types of count: Verifiable, based on electronic documents Non-verifiable Canonical form: XLIFF based Word boundaries: Unicode TR29 Unicode character encoding Minimum conformance Total Character Count Total Word Count

GMX-V GMX-V 1.0 LISA OSCAR Standard Feb 2007 GMX-V 2.0 ETSI LIS Standard Jul 2012 http://www.xtm-intl.com/manuals/gmx-v/GMX-V-2.0.html Version 2.0 added some additional clarification plus support for: Thai Korean Chinese Japanese

GMX-V Counts Verifiable Non-verifiable

Unicode Character Encoding XML Entities must be resolved GMX-V Canonical Form XLIFF 1.2 Unicode Character Encoding XML Entities must be resolved Unicode TR#29 Words Boundaries Remove any formatting characters Words: 4, characters 15, inline elements: 4, punctuation characters 1, white space characters: 3

GMX-V Canonical Form

GMX-V White Space Characters Unicode space characters (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but not non-breaking space ('\u00A0', '\u2007', '\u202F'). '\u0009', HORIZONTAL TABULATION. '\u000A', LINE FEED. '\u000B', VERTICAL TABULATION. '\u000C', FORM FEED. '\u000D', CARRIAGE RETURN. '\u001C', FILE SEPARATOR. '\u001D', GROUP SEPARATOR. '\u001E', RECORD SEPARATOR. '\u001F', UNIT SEPARATOR. '\u200B', ZERO WIDTH SPACE.

GMX-V Punctuation Characters Basic Latin punctuation characters in the ranges of '\u0021' - '\u002F', '\u003A' - '\u0040', '\u005B' - '\u0060', '\u007B' - '\u007E':  !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ The division sign ÷ \u00F7 and multiplication sign × \u00D7 The Spanish inverted exclamation and question marks, \u00A1 (¡) and \u00BF (¿). The Armenian full stop \u0589 The Hebrew colon \u05C3 , maqaf \u05BE and paseq \u05C0 The Arabic semicolon \u061B General Unicode Punctuation: '\u2000'–'\u+206F' CJK Symbols and Punctuation: '\u3000' – '\u303F'

GMX-V French & Italian Apostrophe Unicode TR#29 Section 4 l’Objectif: words: 2, characters: 10 English: can’t: words: 1, characters: 5

GMX-V CJKT Word Factors Chinese (all forms): 2.8 Japanese: 3.0 Korean: 3.3 Thai: 6.0

GMX-V Conformance Minimal Conformance: Word Counts Character Counts

GMX-V Counts Word Count Categories Character Count Categories Auto Text Count Categories Inline Element Count Categories Linking Inline Element Count Categories Text Unit Count Other Count Categories

GMX-V Qualitative Counts Translatable Non-translatable Qualified type

GMX-V Counts Word and Character Count Categories: Protected ExactMatched LeveragedMatched RepetitionMatched FuzzyMatched AlphanumericOnlyTextUnit NumericOnlyTextUnit PunctuationOnlyTextUnit MeasurementOnlyTextUnit W-OtherNonTranslatableTextUnit TW-TranslatableTextUnit

GMX-V Counts Auto Text Count Categories: SimpleNumericAutoText ComplexNumericAutoText MeasurementAutoText AlphaNumericAutoText DateAutoText TMAutoText AC-OtherAutoText

Other Count Categories: TextUnitCount FileCount PageCount ScreenCount GMX-V Counts Other Count Categories: TextUnitCount FileCount PageCount ScreenCount OC-OtherCountCategories

GMX-V Count Exchange Format <metrics:metrics version="1.0" source-language="en-GB" tool-name="XYZ Tool" tool-version="1.23"> <metrics:stage phase="initial" date="2004-12-18T13:06:52Z"> <metrics:notes from="auser@company.com"> Initial count based on source document. </metrics:notes> <metrics:count-group name="non-verifiable"> <metrics:count type="OC-TestingFiles" value="99"/> <metrics:count type="OC-DTPFiles" value="99"/> <metrics:count type="ScreenCount" value="99"/> </metrics:count-group> <metrics:count-group name="verifiable"> <metrics:count type="TotalWordCount" value="99"/> <metrics:count type="TotalCharacterCount" value="99"/> <metrics:count type="TranslatableLinkingInlineCount" value="99"/> </metrics:stage> </metrics:metrics>

GMX-V Count Exchange metrics stage+ notes? count-group+ count+

Question and Answer session Better Translation Technology

Register for future Webinar sessions Contact Details XTM International www.xtm-intl.com Register for future Webinar sessions www.xtm-intl.com/demos Contact azydron@xtm-intl.com +44 (0) 1753 480 479