Presentation on theme: "An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services Provide name of demo, name of presenter (and affiliation."— Presentation transcript:
1 An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services Provide name of demo, name of presenter (and affiliation DCU, TCD, UL, UCD )Only display your relevant track logo(s) – remove the others.Affiliate demo presenters (Cultura, AXES, etc) – add your project logo.Introduce yourself and give the reviewers a one/two sentence synopsis of what you are demo’ing today.Affiliate demo presenters – mention the affiliated project that you are representingHighlight cross-CNGL collaboration if relevant….e.g., ‘ this demo is a collaborative effort between researchers from Dublin City University and Trinity College Dublin.Aonghus Ó hAirt, Dominic Jones, Leroy Finn and David LewisCentre for Next Generation Localisation, Trinity College Dublin
2 Challenges for Interoperability More iterative workflowsFrom push-based hand-offsTo change notification & fine grained retrievalsExtending data management to support innovationStatistical MT, named entity recognition, text analytics for QA and terminology managementAll require up-to-date relevant training corporaSolutions must sit comfortably with technology of:Content Management, Web Publishing & LocalisationNo one Standard can address it allIntegrating ITS and CMIS for L10n& a little on XLIFF, RDF and Open Provenance
4 Internationalisation Tag Set (ITS) Allows I18n and L10n tools to be instructed to treat specific text in specific waysPrinciples:Minimise disturbance of original contentDon’t reinvent wheelLink to existing meta-data before adding newDefined distinct, independent Data CategoriesIdentify relevant text using:Attributes to existing elements: LOCAL selectionXpath selectors in a special element: GLOBAL selection
6 ITS 1.0 Data Categories Translate: Localization Note: Terminology Mark whether the content of an element or attribute should be translated or notLocalization Note:Communicate notes to localizers about a particular item of contentTerminologyMark terms and optionally associate them with information, such as definitionsDirectionalitySpecify the base writing direction of blocks, embeddings and overrides for the Unicode bidirectional algorithmRubyProvide a short annotation of an associated base text, particularly useful for East Asian languagesLanguage InformationExpress the language of a given piece of contentElement within TextIdentify how an element behaves relative to its surrounding text, eg. for text segmentation purposes
8 ITS and Content Management Global ITS rules can be defined in an external fileAttribute applied to a node with following precedence:LOCAL attributesEmbedded GLOBAL rules in reverse orderExternal GLOBAL rules in reverse orderITS allows tool-specific mechanisms for associating global rules with content – precedence not specifiedCommon practice to apply a given set of rules to all documents in a project with the same schemaCan this scale to multiple overlapping schema?Can we use some CMS-level meta-data interoperability solution?
9 CMS InteroperabilityIntegrating with CMS requires the use of an API. Until now, most CMS used proprietary APIsProprietary interfaces to CMS lead to limited support, vendor lock-in and poor interoperability between CMS and with localisation toolsContent Management Interoperability Service (CMIS) from OASIS offers a standardised API for interacting with CMSLocalisation is out of scope for CMISHow can CMIS facilitate the localisation of content across multiple CMS?
10 OASIS Content Management Interoperability Services (CMIS) “defines a domain model and Web Services and Restful AtomPub bindings that can be used by applications to work with one or more Content Management repositories/systems.” (CMIS standard)Published in 2010Participation from Adobe, Alfresco, EMC, IBM, Microsoft, Oracle, SAP, and others.
11 CMIS Implementations Alfresco 3.3+ Apache Chemistry InMemory Server AthentoCOIDay Software CRXEMC DocumentumeXo Platform with xCMISFabasoftHP Autonomy Interwoven WorksiteIBM Content ManagerIBM FileNet Content ManagerIBM Content Manager On DemandIBM Connections FilesIBM LotusLive FilesIBM Lotus Quickr ListsISIS Papyrus ObjectsKnowledgeTree 3.7+Maarch 1.3Magnolia (CMS) 4.5Microsoft SharePoint Server 2010NCMISNemakiWareNuxeo Platform 5.5O3spaces 3.2+OpenIMSOpenWGA 5.2+PTC WindchillSAP NetWeaver Cloud DocumentSeapine Surround SCMSense/Net 6.0+TYPO3VB.CMISNote the distinction between client and server implementations
12 CMIS Objects A repository is a container of objects. Objects have four base types:Document object – “elementary information entities managed by the repository”Folder object – “serves as the anchor for a collection of file-able objects”Relationship object – “instantiates an explicit, binary, directional, non-invasive, and typed relationship between a Source Object and a Target Object”Policy object – “represents an administrative policy that can be enforced by a repository, such as a retention management policy.”(CMIS Specification)
13 CMS-L10n Interoperability: Two Requirements Flexible ITS rule to document bindingsThe same rule to be applied to multiple documentsMultiple rules to be applied to individual documentsSpecify the precedence order in which rules are processed for a documentAim to support external ITS rules via CMISNeed to signal L10n-relevant updates to documentsMLW-LT (ITS2.0) workgroup identified a requirement for such ‘readiness’ signallingAim to support open asynchronous change notification for CMIS
14 Design: Extending CMIS Implementations Two approaches to modelling the localisation information:Custom content modellingAlfresco aspectsImplementation in repositoryAlfresco (primary)Nuxeo (basic testing)
15 ITS rules using Policy Objects Translate rules as policy objects
16 Translate rules as folder objects ITS Rules as FoldersTranslate rules as folder objects
17 Signalling Readiness from CMS Readiness meta-dataIndicates the readiness of a document for submission to L10n processes or provide an estimate of when it will be ready for a particular processData modelready-to-process – type of process to be perfomred nextprocess-ref – a pointer to an external set of process type definitions used for ready-to-processready-at – defines the time the content is ready for the process, it could be some time in the past, or some time in the futurerevised – indicates is this is a different version of content that was previously marked as ready for the declared processpriority – high or lowcomplete-by – indicates target date-time for completing the process
18 Polling extension to CMIS Polling schemes describe the way in which documents are polled for updated readiness propertiesscheme name / IDpolling intervalnotification methodnotification target / hostport (for network connection)readiness propertyreadiness value
27 XLIFF and Open Provenance Capture XLIFF transformations that operate on content and its meta-data as the result of content processing by different localisation workflow servicesA provenance model used to capture process operationsagents and properties of those processesSupport managing & auditing quality of processescorrelating output of individual steps with professional, crowd and consumer judgementsupport end-to-end process managementterminology managementOn-demand language resource assemblye.g. for parallel text for MT training
28 Linked Localisation Data: RDF-based logging Open Provenance VocabularyActive W3C Provenance working group
29 LT Assisted Localisation Process Provenance 12401wasControlledByvalue“I am a string”j.doewasGeneratedByMachine TranswasTranslatedFromwasTranslatedFromvalue“Je suis un phrase”T10:30:00wasGeneratedAtProf transwasGeneratedByc3po1672316740msexpended15601wasGeneratedAtT12:30:00xml:langvalue15790“Je suis une string”T10:07:00wasGeneratedAtCrowdPEwasGeneratedByd.joneswasControlledBywasRevisedFromfr-FRvalue“Je suis un string”15771wasAnnotatedWithT13:17:00m.beanwasGeneratedAtCrowd ratewasControlledBywasGeneratedBy“Poor”value16727T14:05:00l.jfinnwasGeneratedAtText ClassifywasControlledBywasGeneratedByanomolousvaluewasAnnotatedWith16734T13:21:00s.curranwasGeneratedAtTrans QAwasControlledBywasGeneratedBypassvaluewasAnnotatedWith
31 Conclusion Have extended CMIS to support: Document level ITS rulesOpen document change notification mechanismStrong potential to streamline CMS-L10n integration in combination with XLIFF and PROVAchieved with current CMIS specificationCustom extension to folder objectCustom extension to policy object may be betterNext StepsCombining standards for vendor-neutral CMS integrationAligh with ITS2.0 and XLIFF2.0Discuss extensions with CMIS-compliant vendors
32 Questions. Thank You. Follow ITS Use Case at: Follow XLIFF+ITS mapping at: