Presentation is loading. Please wait.

Presentation is loading. Please wait.

ETD2004, June 3-5 2004 University of Kentucky, Lexington Structured ETDs at the Document and Publication Server of Humboldt University From DTD generation.

Similar presentations


Presentation on theme: "ETD2004, June 3-5 2004 University of Kentucky, Lexington Structured ETDs at the Document and Publication Server of Humboldt University From DTD generation."— Presentation transcript:

1 ETD2004, June 3-5 2004 University of Kentucky, Lexington Structured ETDs at the Document and Publication Server of Humboldt University From DTD generation to XML conversion: Uwe Müller Humboldt University, Berlin Electronic Publishing Group u.mueller@cms.hu-berlin.de

2 ETD2004, June 3-5 2004 University of Kentucky, Lexington From DTD generation to XML conversion: Structured ETDs at Humboldt's EDoc Server Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin Background Humboldt University: 800 – 1.000 dissertations / year Germany: duty to publish dissertations –traditional methods: publishing house microfiche 40 … 200 printed copies (depending on faculty regulations) Humboldt U.: not mandatory to submit an ETD ~ ¼ dissertations published electronically XML as central strategy

3 ETD2004, June 3-5 2004 University of Kentucky, Lexington From DTD generation to XML conversion: Structured ETDs at Humboldt's EDoc Server Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin Why XML? Standardized format Long term preservation easily convertible to –presentation formats (HTML, PDF) –other XML structures qualified full text retrieval contains structural and contextual information – in a machine readable format HTML digital signature PDF digital signature Office document digital signature XML

4 ETD2004, June 3-5 2004 University of Kentucky, Lexington From DTD generation to XML conversion: Structured ETDs at Humboldt's EDoc Server Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin XML: Restrictions to deal with XML source does not contain layout information rather linear structure XML is not used as Authoring System –authors use their 'own' systems Microsoft Word LaTeX Open Office / Star Office Framemaker Word Perfect

5 ETD2004, June 3-5 2004 University of Kentucky, Lexington From DTD generation to XML conversion: Structured ETDs at Humboldt's EDoc Server Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin How to overjump the gap? get the authors where they are … instructions and guidelines for authors –usage of style files (e.g., dissertation-hu.dot) –manuals, support hotline, regular courses different conversion processes –SGML author (plug in for MS Word <= 97) –Open Office / Star Office exploit genuine XML format –MS Office 2003  XML according to DiML DTD –common pitfalls: tables, pictures

6 ETD2004, June 3-5 2004 University of Kentucky, Lexington From DTD generation to XML conversion: Structured ETDs at Humboldt's EDoc Server Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin

7 ETD2004, June 3-5 2004 University of Kentucky, Lexington From DTD generation to XML conversion: Structured ETDs at Humboldt's EDoc Server Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin Conversion Process Using Open Office Open Office example.doc example.sxw (zip file). content.xml example_stl.xml example.xml front.xml chapter1.xml chapter2.xml chapter3.xml example.html *.gif *.jpg front.html chapter1.html chapter2.html chapter3.html

8 ETD2004, June 3-5 2004 University of Kentucky, Lexington From DTD generation to XML conversion: Structured ETDs at Humboldt's EDoc Server Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin

9 ETD2004, June 3-5 2004 University of Kentucky, Lexington From DTD generation to XML conversion: Structured ETDs at Humboldt's EDoc Server Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin Principal Structure of a DiML document..title...author...abstract........bibliography...appendix...vita...

10 ETD2004, June 3-5 2004 University of Kentucky, Lexington From DTD generation to XML conversion: Structured ETDs at Humboldt's EDoc Server Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin From flat structure to Hierarchy only two types of styles in Word –paragraph styles –character styles e.g., in case of th first occurring Heading 1 paragraph style the converter has to know –Heading 1 is the beginning of a chapter –Heading 1 implies a head element –the element chapter can only occur in body Introduction

11 ETD2004, June 3-5 2004 University of Kentucky, Lexington From DTD generation to XML conversion: Structured ETDs at Humboldt's EDoc Server Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin

12 ETD2004, June 3-5 2004 University of Kentucky, Lexington From DTD generation to XML conversion: Structured ETDs at Humboldt's EDoc Server Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin

13 ETD2004, June 3-5 2004 University of Kentucky, Lexington From DTD generation to XML conversion: Structured ETDs at Humboldt's EDoc Server Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin

14 ETD2004, June 3-5 2004 University of Kentucky, Lexington From DTD generation to XML conversion: Structured ETDs at Humboldt's EDoc Server Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin

15 ETD2004, June 3-5 2004 University of Kentucky, Lexington From DTD generation to XML conversion: Structured ETDs at Humboldt's EDoc Server Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin

16 ETD2004, June 3-5 2004 University of Kentucky, Lexington From DTD generation to XML conversion: Structured ETDs at Humboldt's EDoc Server Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin

17 ETD2004, June 3-5 2004 University of Kentucky, Lexington From DTD generation to XML conversion: Structured ETDs at Humboldt's EDoc Server Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin One Core – Multiple Views HTML generation (static or dynamic) –performance problems with XSLT and huge documents –solution: division of XML sources into components (easier and fast to process) PDF + Print on Demand (http://www.proprint-service.de) Current problems –changing Office systems and versions ongoing implementations and adaptations necessary but: might be restricted to XSL coding

18 ETD2004, June 3-5 2004 University of Kentucky, Lexington From DTD generation to XML conversion: Structured ETDs at Humboldt's EDoc Server Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin Towards a universal DTD? DiML – originally taken from an SGML DTD at Virginia Tech ("ETD"), http://edoc.hu-berlin.de/diml http://edoc.hu-berlin.de/diml –already many elements (> 100) –combines elements of different description levels –extended and adapted to local needs special requirements from several departments (e.g., literature / dramatics, humanities, geography, …) necessity to include external DTDs (e.g., CALS-Table, MathML, MusicML, …) publication types other than theses and dissertations –conference proceedings, electronic journals, other series, … first approach: extend DTD aiming at a universal 'mega' DTD –problems: complexity, difficult maintenance other possibility: create a completely new DTD for each purpose –loss of interoperability

19 ETD2004, June 3-5 2004 University of Kentucky, Lexington From DTD generation to XML conversion: Structured ETDs at Humboldt's EDoc Server Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin Modular DTD Approach idea: individually adapted DTDs 1.split up DTD into modules, such as –text, structure, citation, dramatics 2.handle external DTDs as modules as well, e.g., –MathML, MusicML, CALS-Table 3.recombine a DTD out of user selected modules result a.a DTD with only the needed elements and modules b.individual reference and sample documents

20 ETD2004, June 3-5 2004 University of Kentucky, Lexington From DTD generation to XML conversion: Structured ETDs at Humboldt's EDoc Server Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin Modular DTD Approach: Benefits modules are easily maintainable –distributed development –version numbers for each module reusability –define (several) styles for each module –reference information for each module support different languages get a DTD that exactly fits your needs

21 ETD2004, June 3-5 2004 University of Kentucky, Lexington From DTD generation to XML conversion: Structured ETDs at Humboldt's EDoc Server Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin DTDSys: Principal Architecture modules: small packages of elements belonging to each other stored in separate files in the DTDBase include metadata, e.g., descriptive information, version numbers, and dependences to other modules DTDSys generates DTD and reference files using –XSL / XSLT –Java –Web Interfaces

22 ETD2004, June 3-5 2004 University of Kentucky, Lexington From DTD generation to XML conversion: Structured ETDs at Humboldt's EDoc Server Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin Modules and Dependences text br, em, strong, sup, sub, u, tt, pre commonp, head, caption, url, name, foreign… structurechapter, section, subsection… citationquotations and references documentspage numbers, footnotes, endnotes, … dimlfront, body, back, abstract…

23 ETD2004, June 3-5 2004 University of Kentucky, Lexington From DTD generation to XML conversion: Structured ETDs at Humboldt's EDoc Server Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin reference. DTD Generation Process DTDBase dependences.html selection.xmlfull-dtd.xml xdiml.dtd dtd-reference.xml p.php chapter.php module-text.xml XSL Java+XSL XSL including element info description dependences

24 ETD2004, June 3-5 2004 University of Kentucky, Lexington From DTD generation to XML conversion: Structured ETDs at Humboldt's EDoc Server Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin Outlook SCOPE = Service Core for Open Publishing Environments –development of Publication Components (authoring tools, conversion mechanisms, layout and style definitions) –management system to maintain versions and dependences –publication system –workflow component Long Term Preservation activities –Implementation of OAIS reference model –Sun Center of Excellence

25 ETD2004, June 3-5 2004 University of Kentucky, Lexington From DTD generation to XML conversion: Structured ETDs at Humboldt's EDoc Server Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin Thanks to Sabine Henneberger, Jakob Voß, Matthias Schulz Thank you! Questions? u.mueller@cms.hu-berlin.de http://edoc.hu-berlin.de/


Download ppt "ETD2004, June 3-5 2004 University of Kentucky, Lexington Structured ETDs at the Document and Publication Server of Humboldt University From DTD generation."

Similar presentations


Ads by Google