Localization industry Word Count Standard Andrzej Zydroń CTO XTM Intl GMX-V Localization industry Word Count Standard Thank you Peter, initially the keynote speaker was going to be Yves Champollion. Unfortunately Yves was taken seriously ill last month. My most sincere sympathy goes out to Yves and his family. I wish him a speedy and full recovery. Yves has contributed much to the development of our industry. Andrzej Zydroń CTO XTM Intl ASLING TC#39, London 2017
GMX Global Information Management Metrics eXchange Tripartite GMX-V : Volume, published (2.0) GMX-C : Complexity (not started) GMX-Q : Quality (not started) Standard for defining a L10N job Allows for quantifying job complexity
Why GMX-V
GMX-V GIM Metrics eXchange – Volume Objectives: Two types of count: Unambiguous and verifiable definition of word and character counts A method of exchanging counts within an XML framework Two types of count: Verifiable, based on electronic documents Non-verifiable Canonical form: XLIFF based Word boundaries: Unicode TR29 Unicode character encoding Minimum conformance Total Character Count Total Word Count
GMX-V GMX-V 1.0 LISA OSCAR Standard Feb 2007 GMX-V 2.0 ETSI LIS Standard Jul 2012 http://www.xtm-intl.com/manuals/gmx-v/GMX-V-2.0.html Version 2.0 added some additional clarification plus support for: Thai Korean Chinese Japanese
GMX-V Counts Verifiable Non-verifiable
Unicode Character Encoding XML Entities must be resolved GMX-V Canonical Form XLIFF 1.2 Unicode Character Encoding XML Entities must be resolved Unicode TR#29 Words Boundaries Remove any formatting characters Words: 4, characters 15, inline elements: 4, punctuation characters 1, white space characters: 3
GMX-V Canonical Form
GMX-V White Space Characters Unicode space characters (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but not non-breaking space ('\u00A0', '\u2007', '\u202F'). '\u0009', HORIZONTAL TABULATION. '\u000A', LINE FEED. '\u000B', VERTICAL TABULATION. '\u000C', FORM FEED. '\u000D', CARRIAGE RETURN. '\u001C', FILE SEPARATOR. '\u001D', GROUP SEPARATOR. '\u001E', RECORD SEPARATOR. '\u001F', UNIT SEPARATOR. '\u200B', ZERO WIDTH SPACE.
GMX-V Punctuation Characters Basic Latin punctuation characters in the ranges of '\u0021' - '\u002F', '\u003A' - '\u0040', '\u005B' - '\u0060', '\u007B' - '\u007E': !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ The division sign ÷ \u00F7 and multiplication sign × \u00D7 The Spanish inverted exclamation and question marks, \u00A1 (¡) and \u00BF (¿). The Armenian full stop \u0589 The Hebrew colon \u05C3 , maqaf \u05BE and paseq \u05C0 The Arabic semicolon \u061B General Unicode Punctuation: '\u2000'–'\u+206F' CJK Symbols and Punctuation: '\u3000' – '\u303F'
GMX-V French & Italian Apostrophe Unicode TR#29 Section 4 l’Objectif: words: 2, characters: 10 English: can’t: words: 1, characters: 5
GMX-V CJKT Word Factors Chinese (all forms): 2.8 Japanese: 3.0 Korean: 3.3 Thai: 6.0
GMX-V Conformance Minimal Conformance: Word Counts Character Counts
GMX-V Counts Word Count Categories Character Count Categories Auto Text Count Categories Inline Element Count Categories Linking Inline Element Count Categories Text Unit Count Other Count Categories
GMX-V Qualitative Counts Translatable Non-translatable Qualified type
GMX-V Counts Word and Character Count Categories: Protected ExactMatched LeveragedMatched RepetitionMatched FuzzyMatched AlphanumericOnlyTextUnit NumericOnlyTextUnit PunctuationOnlyTextUnit MeasurementOnlyTextUnit W-OtherNonTranslatableTextUnit TW-TranslatableTextUnit
GMX-V Counts Auto Text Count Categories: SimpleNumericAutoText ComplexNumericAutoText MeasurementAutoText AlphaNumericAutoText DateAutoText TMAutoText AC-OtherAutoText
Other Count Categories: TextUnitCount FileCount PageCount ScreenCount GMX-V Counts Other Count Categories: TextUnitCount FileCount PageCount ScreenCount OC-OtherCountCategories
GMX-V Count Exchange Format <metrics:metrics version="1.0" source-language="en-GB" tool-name="XYZ Tool" tool-version="1.23"> <metrics:stage phase="initial" date="2004-12-18T13:06:52Z"> <metrics:notes from="auser@company.com"> Initial count based on source document. </metrics:notes> <metrics:count-group name="non-verifiable"> <metrics:count type="OC-TestingFiles" value="99"/> <metrics:count type="OC-DTPFiles" value="99"/> <metrics:count type="ScreenCount" value="99"/> </metrics:count-group> <metrics:count-group name="verifiable"> <metrics:count type="TotalWordCount" value="99"/> <metrics:count type="TotalCharacterCount" value="99"/> <metrics:count type="TranslatableLinkingInlineCount" value="99"/> </metrics:stage> </metrics:metrics>
GMX-V Count Exchange metrics stage+ notes? count-group+ count+
Question and Answer session Better Translation Technology
Register for future Webinar sessions Contact Details XTM International www.xtm-intl.com Register for future Webinar sessions www.xtm-intl.com/demos Contact azydron@xtm-intl.com +44 (0) 1753 480 479