Download presentation
Presentation is loading. Please wait.
Published byAlbert Green Modified over 6 years ago
1
Localization industry Word Count Standard Andrzej Zydroń CTO XTM Intl
GMX-V Localization industry Word Count Standard Thank you Peter, initially the keynote speaker was going to be Yves Champollion. Unfortunately Yves was taken seriously ill last month. My most sincere sympathy goes out to Yves and his family. I wish him a speedy and full recovery. Yves has contributed much to the development of our industry. Andrzej Zydroń CTO XTM Intl ASLING TC#39, London 2017
2
GMX Global Information Management Metrics eXchange Tripartite
GMX-V : Volume, published (2.0) GMX-C : Complexity (not started) GMX-Q : Quality (not started) Standard for defining a L10N job Allows for quantifying job complexity
3
Why GMX-V
4
GMX-V GIM Metrics eXchange – Volume Objectives: Two types of count:
Unambiguous and verifiable definition of word and character counts A method of exchanging counts within an XML framework Two types of count: Verifiable, based on electronic documents Non-verifiable Canonical form: XLIFF based Word boundaries: Unicode TR29 Unicode character encoding Minimum conformance Total Character Count Total Word Count
5
GMX-V GMX-V 1.0 LISA OSCAR Standard Feb 2007
GMX-V 2.0 ETSI LIS Standard Jul 2012 Version 2.0 added some additional clarification plus support for: Thai Korean Chinese Japanese
6
GMX-V Counts Verifiable Non-verifiable
7
Unicode Character Encoding XML Entities must be resolved
GMX-V Canonical Form XLIFF 1.2 Unicode Character Encoding XML Entities must be resolved Unicode TR#29 Words Boundaries Remove any formatting characters Words: 4, characters 15, inline elements: 4, punctuation characters 1, white space characters: 3
8
GMX-V Canonical Form
9
GMX-V White Space Characters
Unicode space characters (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but not non-breaking space ('\u00A0', '\u2007', '\u202F'). '\u0009', HORIZONTAL TABULATION. '\u000A', LINE FEED. '\u000B', VERTICAL TABULATION. '\u000C', FORM FEED. '\u000D', CARRIAGE RETURN. '\u001C', FILE SEPARATOR. '\u001D', GROUP SEPARATOR. '\u001E', RECORD SEPARATOR. '\u001F', UNIT SEPARATOR. '\u200B', ZERO WIDTH SPACE.
10
GMX-V Punctuation Characters
Basic Latin punctuation characters in the ranges of '\u0021' - '\u002F', '\u003A' - '\u0040', '\u005B' - '\u0060', '\u007B' - '\u007E': The division sign ÷ \u00F7 and multiplication sign × \u00D7 The Spanish inverted exclamation and question marks, \u00A1 (¡) and \u00BF (¿). The Armenian full stop \u0589 The Hebrew colon \u05C3 , maqaf \u05BE and paseq \u05C0 The Arabic semicolon \u061B General Unicode Punctuation: '\u2000'–'\u+206F' CJK Symbols and Punctuation: '\u3000' – '\u303F'
11
GMX-V French & Italian Apostrophe
Unicode TR#29 Section 4 l’Objectif: words: 2, characters: 10 English: can’t: words: 1, characters: 5
12
GMX-V CJKT Word Factors
Chinese (all forms): 2.8 Japanese: 3.0 Korean: 3.3 Thai: 6.0
13
GMX-V Conformance Minimal Conformance: Word Counts Character Counts
14
GMX-V Counts Word Count Categories Character Count Categories Auto Text Count Categories Inline Element Count Categories Linking Inline Element Count Categories Text Unit Count Other Count Categories
15
GMX-V Qualitative Counts
Translatable Non-translatable Qualified type
16
GMX-V Counts Word and Character Count Categories: Protected
ExactMatched LeveragedMatched RepetitionMatched FuzzyMatched AlphanumericOnlyTextUnit NumericOnlyTextUnit PunctuationOnlyTextUnit MeasurementOnlyTextUnit W-OtherNonTranslatableTextUnit TW-TranslatableTextUnit
17
GMX-V Counts Auto Text Count Categories: SimpleNumericAutoText
ComplexNumericAutoText MeasurementAutoText AlphaNumericAutoText DateAutoText TMAutoText AC-OtherAutoText
18
Other Count Categories: TextUnitCount FileCount PageCount ScreenCount
GMX-V Counts Other Count Categories: TextUnitCount FileCount PageCount ScreenCount OC-OtherCountCategories
19
GMX-V Count Exchange Format
<metrics:metrics version="1.0" source-language="en-GB" tool-name="XYZ Tool" tool-version="1.23"> <metrics:stage phase="initial" date=" T13:06:52Z"> <metrics:notes Initial count based on source document. </metrics:notes> <metrics:count-group name="non-verifiable"> <metrics:count type="OC-TestingFiles" value="99"/> <metrics:count type="OC-DTPFiles" value="99"/> <metrics:count type="ScreenCount" value="99"/> </metrics:count-group> <metrics:count-group name="verifiable"> <metrics:count type="TotalWordCount" value="99"/> <metrics:count type="TotalCharacterCount" value="99"/> <metrics:count type="TranslatableLinkingInlineCount" value="99"/> </metrics:stage> </metrics:metrics>
20
GMX-V Count Exchange metrics stage+ notes? count-group+ count+
21
Question and Answer session
Better Translation Technology
22
Register for future Webinar sessions
Contact Details XTM International Register for future Webinar sessions Contact +44 (0)
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.