Presentation is loading. Please wait.

Presentation is loading. Please wait.

Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005.

Similar presentations


Presentation on theme: "Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005."— Presentation transcript:

1 xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

2 Machine Translation Translation Memory Hybrid Linguistic Inference Engines Terminology Automating Translation

3 Machine translation 40 year history Rigorous control of grammar and terminology can produce good results Lots of interesting new developments with hybrid statistical/transfer based systems Translation of free format text is theoretically impossible with current technology.

4 Translation Memory Align source and target text Look up new text against memory Relatively primitive technology Not much innovation over the past 30 years Need for proofing Proprietary translation memory formats

5 XML inherently easier to translate Separation of form and content Support for Unicode and other international encoding formats. Allows multiple output formats - PDF, XHTML, WAP Translating XML Documents

6 XML Translation Standards LISA - Localization Industry Standards Association: http://www.lisa.orghttp://www.lisa.org OASIS - Organization for the Advancement of Structured Information Standards: http://www.oasis-open.orghttp://www.oasis-open.org W3C - World Wide Web Consortium: http://www.w3c.org http://www.w3c.org OLIF Consortium: http://www.olif.nethttp://www.olif.net

7 LISA Standards TMX - Translation Memory Exchange format: http://www.lisa.org/tmxhttp://www.lisa.org/tmx TBX - Termbase Exchange format: http://www.lisa.org/tbx http://www.lisa.org/tbx SRX - Segmentation Rules Exchange format: http://www.lisa.org/srxhttp://www.lisa.org/srx GMX - GILT Metrics Exchange format: http://www.lisa.org/gmx http://www.lisa.org/gmx

8 OASIS L10N Standards XLIFF - XML Localization Interchange File Format: http://www.oasis- open.org/committees/tc_home.php?wg_abbrev= xliffhttp://www.oasis- open.org/committees/tc_home.php?wg_abbrev= xliff TransWS - Translation Web Services: http://www.oasis- open.org/committees/tc_home.php?wg_abbrev= trans-ws http://www.oasis- open.org/committees/tc_home.php?wg_abbrev= trans-ws DITA – Darwin Information Technology Architecture http://www.oasis- open.org/committees/tc_home.php?wg_abbrev= ditahttp://www.oasis- open.org/committees/tc_home.php?wg_abbrev= dita

9 W3C and OLIF W3C ITS http://www.w3.org/International/ http://www.w3.org/International/its OLIF - Open Lexicon Interchange Format: http://www.olif.net http://www.olif.net

10 XML namespace Major feature of XML Allows the mapping of different ontological entities onto the same representation Allows different ways to look at the same data Namespaces can be made transparent

11 xml:tm XML based text memory Revolutionary approach to translating XML documents First significant advance in translation memory technology Uses XML namespace to transparently embed contextual information

12 xml:tm namespace Text Memory namespace Can be mapped onto any XML document Vertical view of document in terms of ‘text segments’ Can be totally transparent

13 xml:tm namespace Example of the use of tm namespace in an XML document: Namespace is very flexible. It is very easy to use.

14 xml:tm namespace doc title section para tm te sentence tu te sentence tu te sentence tu tm namespace view original document view te text tu text te sentence tu para text para text para text para text para text te sentence tu te sentence tu text

15 xml:tm namespace Namespace is very simple. It is easy to use. te sentence tu original document view tm namespace view Namespace is very simple. It is easy to use. text

16 xml:tm Text Memory Author memory Maintain memory of source text Authoring statistics Authoring tool input Translation memory Automatic alignment Maintain perfect link of source and target text Reduce translation costs

17 Updated Source Document tu id=”1” tu id=”3” tu id=”4” tu id=”7” tu id=”6” d eleted tu id=”8” new Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” xml:tm DOM differencing origid=”5” modified

18 xml:tm Author Memory Namespace aware DOM differencing Identify changes from the previous version Unique text unit identifiers are maintained Modification history Text units can be loaded into a database Authoring environment integration

19

20

21 xml:tm Translation Memory The tm namespace can be used to create XLIFF files Automatic alignment of source and target languages Allows for more focused translation matching –Exact matching –Leveraged matching from document - identical text –Leveraged matching from database –Modified text unit matching –Non translatable text unit identification

22 DITA Strengths Topic-centric level of granularity Very well thought out and flexible architecture for content creation and publishing Substantial reuse of existing assets Specialization at the topic and domain levels Automated processing based on meta data property Translate topic only once, reuse many times

23 DITA and xml:tm Both complement each other xml:tm encourages text reuse at the sentence level Automates translation matching and extraction Automatic alignment of source and target documents at the text unit (sentence) level Introduces the concept of exact matching for translation as well as focused matching Fully integrated with existing standards such as SRX, GMX, TMX and XLIFF

24 xml:tm translation via XLIFF Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Translated Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” XLIFF Document trans-unit id=”1” trans-unit id=”2” trans-unit id=”3” trans-unit id=”4” trans-unit id=”5” trans-unit id=”6”

25 doc title section para tekst tm te zdanie tu te zdanie tu te zdanie tu translated tm namespace view translated document view te tekst tu tekst te zdanie tu para tekst para tekst para tekst para tekst para tekst te zdanie tu te zdanie tu xml:tm translated document

26 Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Translated Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Exact alignment xml:tm perfect alignment

27 xml:tm perfect matching Updated Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”7” tu id=”6” d eleted tu id=”8” modified new Matched Target Document tu id=”1” tu id=”3” tu id=”4” tu id=”7” tu id=”6” tu id=”8” Perfect Matching requires translation

28 xml:tm leveraged DB memory Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Translated Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Perfect alignment DB TMX

29 xml:tm in-document leveraged matching Updated Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”7” tu id=”6” d eleted tu id=”8” modified new:same id=”3” Matched Target Document tu id=”1” tu id=”3” tu id=”4” tu id=”7” tu id=”6” tu id=”8” Perfect Matching requires translation requires proofing leveraged match

30 xml:tm in-document fuzzy matching Updated Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”7” tu id=”6” d eleted tu id=”8” mod:origid=”5” New:same Matched Target Document tu id=”1” tu id=”3” tu id=”4” tu id=”7” tu id=”6” tu id=”8” Perfect Matching requires translation requires proofing fuzzy match leveraged match

31 xml:tm db leveraged matching Updated Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”7” tu id=”6” d eleted tu id=”8” mod:origid=”5” new:same Matched Target Document tu id=”1” tu id=”3” tu id=”4” tu id=”7” tu id=”6” tu id=”8” Perfect Matching requires translation requires proofing fuzzy match doc leveraged match tu id=”9” DB requires proofing DB leveraged match

32 Updated Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”7” tu id=”6” non trans tu id=”8” new:same Matched Target Document tu id=”1” tu id=”3” tu id=”4” tu id=”7” tu id=”6” tu id=”8” Exact Matching requires translation requires proofing fuzzy match doc leveraged match tu id=”9” DB requires proofing DB leveraged match tu id=”2” requires no translation non translatable xml:tm non-translatable text

33 Traditional Translation Scenario source text PublishingTranslation source text extract Extracted text tm process Prepared text Translate Translated text target text merge target text QA

34 xml:tm source text Publishing Translator extract Extracted text tm process XLIFF file Translate xml:tm target text merge Web perfect matching leveraged matching Automatic Process Web service/ interface QA Automatic Process xml:tm Translation Scenario

35

36

37 xml:tm benefits Open Standard donated by XML INTL to LISA Complements DITA Enterprise level scalability Totally integrated within the XML framework Source text is automatically extracted and matched Word counts are controlled by the customer Text can be presented for translation via the web Data is merged automatically at end of translation cycle All memory operations are totally automated Can be used transparently for relay translations More accurate – better matching

38 xml:tm Full specification: –http://www.xml- intl.com/docs/specification/xml-tm.htmlhttp://www.xml- intl.com/docs/specification/xml-tm.html Maintained by xml-intl.com –http://www.xml-intl.com/dtd/tm.dtdhttp://www.xml-intl.com/dtd/tm.dtd –http://www.xml-intl.com/dtd/tm.xsdhttp://www.xml-intl.com/dtd/tm.xsd Detailed article on xml:tm in www.xml.com www.xml.com Donated by XML INTL to Lisa OSCAR

39

40

41

42

43 Any questions?

44 XML INTL Contact Details Postal address: PO Box 2167 Gerrards Cross Bucks SL9 8XF United Kingdom Phone: +44 1753 480 467 Fax: +44 1753 480 465 Bob Willans - bwillans@xml-intl.combwillans@xml-intl.com Andrzej Zydroń – azydron@xml-intl.comazydron@xml-intl.com Bartek Bogacki – bbogacki@xml-intl.combbogacki@xml-intl.com


Download ppt "Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005."

Similar presentations


Ads by Google