Presentation is loading. Please wait.

Presentation is loading. Please wait.

Use of XML in the Publications Office: Critical issues for publishing Dr. Holger Bagola Publications Office DIR/R 5 “IT Projects” section “Formats & Linguistic.

Similar presentations


Presentation on theme: "Use of XML in the Publications Office: Critical issues for publishing Dr. Holger Bagola Publications Office DIR/R 5 “IT Projects” section “Formats & Linguistic."— Presentation transcript:

1 Use of XML in the Publications Office: Critical issues for publishing Dr. Holger Bagola Publications Office DIR/R 5 “IT Projects” section “Formats & Linguistic Informatics” October 2006

2 Use of XML in the Publications Office 2 History From SGML to XML Structure of publications in Formex Streamlining of models Current status of Formex Particular needs for publishing Conclusion

3 Use of XML in the Publications Office 3 History From SGML to XML Structure of publications in Formex Streamlining of models Current status of Formex Particular needs for publishing Conclusion

4 Use of XML in the Publications Office 4 History In the 70ies more and more publication procedures were supported by computer applications. No common standard for applications in the context of publishing Publishing houses were confronted by a large variety of formats.

5 Use of XML in the Publications Office 5 History A considerable amount of documents published in the Official Journal can be totally of partially re-used for the publications of other documents. As the electronic formats of published documents were not standardized, it was impossible to install convenient procedures.

6 Use of XML in the Publications Office 6 History First information published on SGML as a future standard for the exchange of documents in the early 80ies Main advantages of the approach: –Independence from any application or operating platform –Description of logical document structure instead of presentation

7 Use of XML in the Publications Office 7 History In 1982 the Publications Office decided to define a format for the exchange of published documents: Formex (Format for the exchange of electronic publications).

8 Use of XML in the Publications Office 8 History Publication of Formex specifications in 1984/1985 Formex part of the framework contract for OJ publications in 1985 1986: Adoption of the SGML standard by ISO (ISO 8879)

9 Use of XML in the Publications Office 9 History BUT... There was not a real support of the format on the market (parsers, editors, etc.). The approach seemed to be rather exotic for printing houses which were used to the presentation of documents. The quality of delivered SGML documents was rather poor.

10 Use of XML in the Publications Office 10 History Revision and partial redesign of Formex Addition of a basic table model Formex 2 was easier to understand by the framework contractors. Better quality, but still insufficient for publication: impossible to derive the document presentation from the rough description of the document structure.

11 Use of XML in the Publications Office 11 History Total redesign of Formex specifications –Implementation of more flexible table model –Integration of metadata into the SGML document structure –Finer granularity and distinct elements for description of document structure (possibility of deriving presentation from structure

12 Use of XML in the Publications Office 12 History Rather complex specification which needed an intensive validation of the deliveries.

13 Use of XML in the Publications Office 13 History Since 1998: XML as a new, but compatible standard was adopted by W3C. XML was immediately accompanied by additional standards which supported the navigation and transformation of documents. A new standard for the specification of XML grammars was adopted in 2001: XML Schema

14 Use of XML in the Publications Office 14 History In 2001 the Publications Office organized a Formex user meeting to discuss about future development of the approach. The main results of this meeting were: –Migration to XML for which various tools were on the market (partly as open source) –Replacement of the DTD methodology for specifying XML grammars by XML Schema

15 Use of XML in the Publications Office 15 History From SGML to XML Structure of publications in Formex Streamlining of models Current status of Formex Particular needs for publishing Conclusion

16 Use of XML in the Publications Office 16 From SGML to XML Revision of approach in order to define a grammar which meets the needs of printing houses without abandoning the description of the logical document structure Definition of a table model based on the HTML model (keeping logical relations and functions in attributes)

17 Use of XML in the Publications Office 17 From SGML to XML Abandon of parallel models: distinction made by context analysis Replacement of character encoding based on ISO 2022 by Unicode (UTF- 8, the default for XML instances) All documents contain a reference to the Formex schema on the web: http://formex.publications.europa.eu

18 Use of XML in the Publications Office 18 From SGML to XML Distinction of up to four levels of a publication Definition of rules for automatic validation of Formex instances beyond parsing Development of a comparison tool for the contents of Formex instances with corresponding PDF instances Automatic extraction of metadata for updating of EUR-Lex

19 Use of XML in the Publications Office 19 From SGML to XML The XML based version of the Formex 4 specifications entered into force on May 1 st,2004. The current release is 3.00.

20 Use of XML in the Publications Office 20 History From SGML to XML Structure of publications in Formex Streamlining of models Current status of Formex Particular needs for publishing Conclusion

21 Use of XML in the Publications Office 21 Structure of publications in Formex Formex instances concern OJ publications only (L and C series) Other publications are possible, but currently not realized

22 Use of XML in the Publications Office 22 Structure of publications in Formex Description of publication structure: –Description of structure and composition of the publication stricto sensu –Description of structure and composition of a document –Contents of document and sub-documents –Non-XML parts or fragments of documents

23 Use of XML in the Publications Office 23 Structure of publications in Formex Publication Description of logical structure and composi- tion References to documents Document References to main and sub-docu- ments Document References to main and sub-docu- ments Main document Sub- document Main document Non-XML instance

24 Use of XML in the Publications Office 24 Structure of publications in Formex In order to keep a minimum of metadata information together with the contents of a document some of the corresponding items are present on various levels. All sub-levels contain references to the superior hierarchical level (except for non-XML instances).

25 Use of XML in the Publications Office 25 History From SGML to XML Structure of publications in Formex Streamlining of models Current status of Formex Particular needs for publishing Conclusion

26 Use of XML in the Publications Office 26 Streamlining of models Whenever a Formex 3 element could appear in various contexts distinct elements were created. Thus there were parallel models such as TI.DOC, TI.ANNEX, TI.GRSEQ etc. These elements were grouped together, the context expressing the distinct functions.

27 Use of XML in the Publications Office 27 Streamlining of models Old ACT/TI.DOC ANNEX/TI.ANNEX GR.SEQ/TI.GRSEQ New ACT/TITLE ANNEX/TITLE GR.SEQ/TITLE TITLE[parent::ACT] TITLE[parent::ANNEX] TITLE[parent::GR.SEQ]

28 Use of XML in the Publications Office 28 Streamlining of models Old table model The table model in Formex 1-3 was a logical one, distinguishing between the column and line headings and the body. The body could easily be identified and copied to another linguistic version.

29 Use of XML in the Publications Office 29 Streamlining of models Old table model Empty cells were not present in old instances. Attributes expressed the relation between cells and columns.

30 Use of XML in the Publications Office 30 Streamlining of models New table model Top-down model for headings and body. Attributes express the distinct function of a specific cell. Empty cells are present containing a special attribute which explicitely confirms the absence of any contents.

31 Use of XML in the Publications Office 31 History From SGML to XML Structure of publications in Formex Streamlining of models Current status of Formex Particular needs for publishing Conclusion

32 Use of XML in the Publications Office 32 Current status of Formex Formex 4 is totally W3C Schema based. It is in use since May 2004. Minor changes were integrated (release 3.0) All OJ (L and C) documents are covered. Further document types (not published in OJ) will be taken into account.

33 Use of XML in the Publications Office 33 Current status of Formex Specification, documentation of all elements, physical specification, examples (> 600) publicly available on web-site: http://formex.publications.europa.eu

34 Use of XML in the Publications Office 34 Current status of Formex Availability of Formex via the LegisWrite interface XML instances are not (yet?) publicly accessible Different quality levels according to validation

35 Use of XML in the Publications Office 35 Current status of Formex Printing house CERES Quality 1Quality 2Quality 3 Automatic validation Manual validation EUDOR LegisWrite Interface Conversion to LW Client

36 Use of XML in the Publications Office 36 History From SGML to XML Structure of publications in Formex Streamlining of models Current status of Formex Particular needs for publishing Conclusion

37 Use of XML in the Publications Office 37 Particular needs for publishing Publishing mostly concerns the presentation of documents in a readable form. A “good” logical XML model allows for the derivation of the presentation of a given document. Printing houses are obliged to work with Formex instances along the production processes.

38 Use of XML in the Publications Office 38 Particular needs for publishing Some parts of a document (words, parts of a sentence) require a specific presentation which is not always logical. Specific elements for text highlighting and presentation had to be created. Ex. Foreign words in some language versions in italics.

39 Use of XML in the Publications Office 39 Particular needs for publishing Quotation marks differ from one language version to the other. Exceptions for the use on nested levels require the presence of the specific symbols.

40 Use of XML in the Publications Office 40 Particular needs for publishing For special cases the printing houses are allowed to use temporary additional markup (processing instructions, elements from other namespaces). In most cases this kind of information depends on the publishing system.

41 Use of XML in the Publications Office 41 Particular needs for publishing All this additional information has to be deleted before sending the electronic version of the publication. For the design of new elements the relation to presentation has to be analyzed. In most cases it has to be assured to guarantee the correct identification of the new element.

42 Use of XML in the Publications Office 42 Particular needs for publishing Conversion into other electronic formats requires similar measures. Regular derivations are –Presentation in the Official Journal –Presentation in LegisWrite –Presentation in HTML

43 Use of XML in the Publications Office 43 Particular needs for publishing Formex (XML) instance Format “Official Journal” (PDF) Format “LegisWrite” (RTF) Format “EUR-Lex” (HTML)

44 Use of XML in the Publications Office 44 History From SGML to XML Structure of publications in Formex Streamlining of models Current status of Formex Particular needs for publishing Conclusion

45 Use of XML in the Publications Office 45 Conclusion Since the beginnings Formex is a common exchange format which is independent from any application or platform. Clear character encoding in all versions

46 Use of XML in the Publications Office 46 Conclusion Availability of tools on the market for XML based instances: –RXP for validating DTD parsing –XSV for validating XML Schema parsing –XMLSpy for development (+ Saxon) –XMetal for content editing –renderX for generation of PDF

47 Use of XML in the Publications Office 47 Conclusion Stylesheets (based XSL FO) for presentation Future enhancements: –Better integration of other source formats (RTF/LegisWrite) –Addition of other document types not necessarily related to the Official Journal


Download ppt "Use of XML in the Publications Office: Critical issues for publishing Dr. Holger Bagola Publications Office DIR/R 5 “IT Projects” section “Formats & Linguistic."

Similar presentations


Ads by Google