Part of the Multilingual Web-LT Program

Part of the Multilingual Web-LT Program
Practical Visualization of ITS 2.0 Categories for Real World Localization Process Part of the Multilingual Web-LT Program This is a presentation of ongoing software development project. The software will display the localization metadata embedded in the content using ITS 2.0 markup.

WHAT IS ITS and why it’s so important
The Internationalization Tag Set (ITS) is a set of attributes and elements designed to provide internationalization and localization support in XML and HTML documents. It also defines implementations of these concepts XML developers can use this namespace to integrate internationalization features directly into their own XML schemas and documents The set is currently almost ready/frozen We believe that this is a one of the key standards for localization industry The set includes a number of categories of crucial importance to translators: Terminology note and Localization Note metadata Translate (yes/no) metadata to mark non-translatable text ITS metadata make it possible to include various instructions for translators into documents, add terminology and comments, and mark non-translatable segments Will reduce inconsistency in adding translation instructions to documents Provides a universal interface for transferring translation metadata between tools The key characteristics of ITS format are data interchange; approval as a standard; out-of-the-box usage. ITS 2.0 is a multipurpose standard of metadata format: 1) It can be used to exchange metadata between different data processing tools; and 2) also to deliver metadata to linguists who are working on the content. As a new standard, ITS 2.0 deems to be supported by developers of content processing tools. Its core metadata categories, such as Localization Note, Terminology, and Translate will help to build the bridge between localization instructions and the content. ITS provides a solid basis of development and implementation of Natural Language Processing tools.

WHY ARE WE DOING THIS: DETAILS
To make it possible to comment translatable content irrespective of its nature To make these instructions easily accessible to translators and editors Including recommendations, instructions, terminology suggestions Independent from translation tools Saving time: The text is already marked with context information One doesn’t have to think whether smth. NEEDS TO BE TRANSLATED or not One doesn’t have to think whether smth. IS A TERM or not Key advantages/improvements: Time (i.e. cost) Quality (fewer translation errors) Also very important for machine translation applications (post-editing in context) Why are we doing this? From LSP perspective, the adoption of ITS by LSP’s clients will help to reduce the project management time and unify the internal workflows. Even if the clients do not support ITS yet, LSP still can implement ITS to exchange the localization metadata with translators and editors. From translators and editors perspective, ITS metadata will help to reduce the time spent on searching localization instructions for specific info and to reduce the time spent on checking the terms against the glossaries. The result would be higher quality of translations. From the point of view of implementation of new linguistic technologies, ITS will drive the development of machine-translation and other NLP tools. For example, for MT applications, the core feature is the support of correct terminology.

WHY ARE WE DOING THIS: WORKFLOW PARADIGM CHANGE
FROM: Bulk manual translation of “raw” content or post-editing “raw” machine-translation output When external terminology glossaries, localization instructions and reference data are matched with content in indirect manner mostly in translator’s brain on-the-fly and to the extent of his/her understanding of these instructions and personal skills TO: Using natural language processing (NLP) tools and ITS metadata markup to pre- populate content to be translated or post-edited with context-related information When external terminology glossaries, localization instructions and reference data are matched with content directly through automated process of preliminary linguistic analysis Pre-processing is controlled by dedicated qualified linguists/terminologists/editors PROVIDED THAT: Glossaries, instructions and reference data are converted into format compatible with NLP tools and ITS markup And corresponding content searching algorithms are created (including fuzzy algorithms) In general, the implementation of ITS metadata will drive the change in localization paradigm. In classic paradigm, the localization instructions, glossaries and other reference materials are kept separated from the content. Now you can set metadata-level relations between different pieces of content and all those context information. However, to get most of such symbiosis of content and metadata, the glossaries, instructions and reference data should be prepared and converted into the formats which would be compatible with ITS markup. The second prerequisite is the development of NLP tools, especially fuzzy searching tools, to match metadata with respective content.

What is being developed
ITS 2.0 implementation project, a part of the Multilingual Web-LT program funded by EU Developing the ITS Browser Plugin as a building block of future “Work In Context System” (WICS) Making it possible to view standard ITS (Internationalization Tag Set) translation- related metadata contained in XML, XLIFF, or HTML files Can be done in parallel with translating using CAT tools or for reviewing materials The JavaScript plugin would support most popular browsers For previewing XML or XLIFF, standalone filters for conversion into HTML will be used Implementation: Standard-based preview solution: HTML5, Java Script, Web browser A script located in the same folder as HTML files The script is started by the browser automatically It is expected that both scripts and filters will be publicly available This development project is an intersection of two parent projects. The first parent project is Multilingual Web-LT program funded by EC, the second one is Logrus internal R&D program aimed at an automation of localization workflows and increasing translation quality and productivity of translators and editors. The software being developed will display ITS metadata embedded into XML, XLIFF, or HTML files. The core feature is a CAT-independent preview of metadata in standard Web browsers using Java Script while the translation is done in parallel in any CAT tool. To support XML and XLIFF formats, text converters will be developed to transform these formats into HTML5 format.

The Project Idea ITS metadata-enriched XML or XLIFF files: what’s inside? Previewing ITS metadata in Web browser while translating content in any CAT tool Standard-based preview solution: HTML5, Java Script, Web browser Next step: ITS metadata as a carrier for localization instructions and any reference data So the main reasons for us to launch this project were 1) how to preview the metadata and 2) how to load the metadata with localization-related information. The answer was 3) to preview metadata in Web browser outside any CAT tool and 4) develop the preview solution based on HTML5 format and Java Script to get universal, flexible, open solution.

The Work Breakdown: Project Components
Visual designs Java scripts to render and navigate metadata and content Rich sample files Content format conversion algorithms: XML+ITS -> HTML5+ITS* XLIFF+ITS -> HTML5+ITS* XML+ITS -> XLIFF+ITS (just an example) HTML+ITS -> HTML5+ITS* * For the purposes of visualization, some redundant ITS syntax options for HTML are not supported. The project consists of four components: visual designs; JQuery library and Java scripts; sample XML and HTML files; data conversion utilities.

THE PROJECT CORE: VISUAL DESIGNS
Screen space limitations in localization process: This is the typical screen space allocation when you translate content in some CAT tool. Just about 1/3 of the screen is available for previewing metadata.

THE PROJECT CORE: VISUAL DESIGNS (CONT.)
Collapsed view of metadata In metadata preview area, you can navigate through the content and metadata items within the content. The selected fragment of content is enclosed in red frame; metadata items are highlighted in blue.

Expanded view of metadata This design shows the fragment of content with several metadata items inside. The active metadata item is marked with red triangle. Below the content panel there are metadata item title and metadata information panel. In this example the terminology notes are displayed.

Summary view of metadata This is more complicated design with all metadata items expanded. The metadata items and their information panels are numbered to see which panel corresponds to which metadata item.

Color highlighting to indicate metadata linked to content To highlight some metadata categories within content you can use different background colors of the text.

Visual “tags” to indicate metadata linked to content Other metadata categories are highlighted with pairs of visual markers similar to hypertext markup.

Visual tags to highlight metadata (example) This is an example of the visual markers.

DEVELOPMENT STATUS Sample files: to be completed by end of May
File conversion algorithms: to be completed by Sep 30: XML+ITS -> XLIFF+ITS (July) (sample) XML+ITS -> HTML5+ITS (August) HTML+ITS -> HTML5+ITS (August) XLIFF+ITS -> HTML5+ITS (September) Visualization scripts: to be completed by end of June Here is the project schedule with major milestones. The project to be completed by the end of September.

KNOWN ISSUES: FORMAT CONVERSIONS
“Translation” of XPath expressions from source XML to target HTML XLIFF: MRK element to be used instead of SPAN Selection between SPAN and DIV elements in output HTML Merging external ITS rule files into internal list of rules We have identified some data conversion issues to be resolved, including 1) conversion of Xpath expressions in ITS rules from XML to HTML syntax; 2) extraction of all ITS rules from external files and merging these rules in target HTML files; and some other issues.

KNOWN ISSUES: METADATA VISUALIZATION
Parsing local standoff markup along with other rules Parsing list of merged ITS rules Hyperlinks embedded in metadata Static definitions like “Do not translate” for Translate category Highlighting active ITS item Displaying summary of all ITS items Parsing nested ITS metadata Differences in Java Script implementation between browsers Navigation through content and ITS items Fragmentation of content to avoid large pieces of text to be displayed We have also identified some data visualization issues to be resolved, including 1) correct parsing ITS rules; 2) support of hyperlinks to external Web pages in metadata; 3) static descriptions used for several metadata categories; 4) nested metadata items; 5) large pieces of content to be displayed; etc.

Live Demo The demo samples are built on the preliminary versions of visual designs and illustrate just a few ITS data categories: Localization Note Terminology Translate Here is the live demo

THANK YOU! Questions?

Part of the Multilingual Web-LT Program

Similar presentations

Presentation on theme: "Part of the Multilingual Web-LT Program"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Part of the Multilingual Web-LT Program

Similar presentations

Presentation on theme: "Part of the Multilingual Web-LT Program"— Presentation transcript:

Similar presentations

About project

Feedback