Workshop on XML-Based Library Applications 5

Slides:



Advertisements
Similar presentations
OCLC Online Computer Library Center Connexion Overview Session OCLC CJK Users Group 2007 Annual Meeting March 24, 2007, Boston.
Advertisements

DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
XML: Extensible Markup Language
Providing Online Access to the HKUST University Archives: EAD to INNOPAC Sintra Tsang and K.T. Lam The Hong Kong University of Science and Technology 7th.
Information Retrieval in Practice
InnoFace InnoFace: Extra functions and interface for Innopac Library System – Fung Ping Shan Library experiment LO Tin-king 2nd Hong Kong Innovative Users.
A Practical Introduction to XML in Libraries Marty Kurth NYLA October 22, 2004.
The Future of the Document Paper is OUT Trees are IN UVic Humanities Computing and Media Centre.
Implementation of One Stop Search by XSLT By Dave Low University of Hong Kong 9-Dec-2003.
Last revised: 10 December 2006 HKIUG Unicode Task Force and the EACC to Unicode Migration Ki Tat LAM Head of Library Systems The Hong Kong University of.
Introduction to XML Extensible Markup Language
Overview of Search Engines
Batch-conversion of Non-standard Multiscript Records by XSLT Lucas Mak Metadata and Catalog Librarian Michigan State University Catalog Management Interest.
Unicode, character sets, and a a little history. Historical Perspective First came EBCIDIC (6 Bits?) Then in the early 1960s came ASCII – Most computers.
OCLC Online Computer Library Center Two Paths to Interoperable Metadata Jean Godby, Devon Smith, Eric Childress DC-2003 September 29, 2003.
CHARACTERS Data Representation. Using binary to represent characters Computers can only process binary numbers (1’s and 0’s) so a system was developed.
LBSC 670 Organization of Information. Review Metadata models Dublin Core Metadata Standards Dublin core, MARC Encoding Schemes HTML, XML, MARC… Advanced.
Globalisation & Computer Systems week 5 1. Localisation presentations 2.Character representation and UNICODE UNICODE design principles UNICODE character.
Geolinking content Patrick H. Lauke / Institutional Web Management Workshop 2007 / York Experiments in connecting virtual and physical places.
Chapter 16 The World Wide Web. 2 The Web An infrastructure of information combined and the network software used to access it Web page A document that.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
CPS120: Introduction to Computer Science The World Wide Web Nell Dale John Lewis.
Updated :02 Hong Kong University of Science & Technology Library XML Name Access Control Repository at the Hong Kong University of Science.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
XML Overview. Chapter 8 © 2011 Pearson Education 2 Extensible Markup Language (XML) A text-based markup language (like HTML) A text-based markup language.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
Character Encoding, F onts. Overview Why do character encoding and fonts matter to linguists? How can you identify problems? Why do these problems arise?
CITA 330 Section 6 XSLT. Transforming XML Documents to XHTML Documents XSLT is an XML dialect which is declared under namespace "
Presentation Topic: XML and ASP Presented by Yanzhi Zhang.
XML eXtensible Markup Language. Topics  What is XML  An XML example  Why is XML important  XML introduction  XML applications  XML support CSEB.
A worldwide library cooperative OCLC Online Computer Library Center OCLC CJK Users Group 2007 Annual Meeting March 24, 2007, Boston David Whitehair, OCLC.
CEAL 2003 XML for CJK Wooseob Jeong School of Information Studies University of Wisconsin - Milwaukee.
PatentScope - Electronic Publication World Intellectual Property Organization.
Demonstration of HKCAN database Outline Database system overview Software characteristics Database status.
XML Basics A brief introduction to XML in general 1XML Basics.
Using XML to store Descriptive Metadata Richard Murphy Rosarie O’Riordan Central Statistics Office Ireland.
XML Alyssa Roberts. What is XML? Extensible Markup Language Specification to creating custom mark-up languages Simplified version of SGML, originally.
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
INFSY 547: WEB-Based Technologies Gayle J Yaverbaum, PhD Professor of Information Systems Penn State Harrisburg.
Invitation to Computer Science 6 th Edition Chapter 10 The Tower of Babel.
Combine_and_stir (Aleph data + RDF + Python + other things) IGeLU 2015 Developer’s Day Budapest, Hungary Laura Akerman.
Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American.
General Architecture of Retrieval Systems 1Adrienn Skrop.
XML 1. Chapter 8 © 2013 Pearson Education, Inc. Publishing as Prentice Hall SAMPLE XML SCHEMA (XSD) 2 Schema is a record definition, analogous to the.
1 ODF and Web Mashups Basic techniques Rob Weir, IBM :15.
Beyond HTML: Extensible Markup Language (XML)
Updated :02 Hong Kong University of Science & Technology Library Workshop on XML-Based Library Applications 1. What is XML?
Updated :02 Hong Kong University of Science & Technology Library Workshop on XML-Based Library Applications 4. XML Standards and Tools.
Information Retrieval in Practice
Web Database Programming Using PHP
7th Annual Hong Kong Innovative Users Group Meeting
Web Programming Language
Project 1 Introduction to HTML.
This is the cover slide..
XML Related Technologies
Search Engine Architecture
Web Database Programming Using PHP
HKIUG Unicode Task Force and the EACC to Unicode Migration
A Lightweight Structured Data Implementation Using JSON-LD and Schema
Microsoft Access 2003 Illustrated Complete
Prepared for Md. Zakir Hossain Lecturer, CSE, DUET Prepared by Miton Chandra Datta
A Match Made In (Ethereal) Heaven
XML Problems and Solutions
Updates on the XSLT stylesheets for DDI
Márton Németh – László Drótos How to catalogue a web archive?
CSE591: Data Mining by H. Liu
Use Cases Simple Machine Translation (using Rainbow)
Presentation transcript:

Workshop on XML-Based Library Applications 5 Workshop on XML-Based Library Applications 5. Library Applications (Part One) Hong Kong University of Science & Technology Library updated 2003.11.21 10:02

Outline Part One Part Two Using XSLT (New Acquisitions List) Metadata Design (Electronic Journals) Multi-Script Considerations (Theses and Antique Maps) Part Two XML Name Access Control Repository Hong Kong University of Science & Technology Library

New Acquisitions List (1) http://library.ust.hk/res/newbooks/ Design considerations: No need to build database Static files, one set for each week Web interface by Perl script Weekly static files generated by Perl script as a batch job at night Hong Kong University of Science & Technology Library

New Acquisitions List (2) List of III record numbers Create Review List Retrieve metadata by xrecord= command xrecord requests Send metadata IIIRECORDs Transformation By XSLT Stylesheets HTML pages RSS files INNOPAC Weekly List Generation Hong Kong University of Science & Technology Library

New Acquisitions List (3) XSLT transformation of IIIRECORD to New Acquisitions Record Requires a few passes of XSLT Locally developed tool to convert EACC codes in “braced form” to UTF-8 Sample IIIRECORD Resulting Record after XSLT transformation Hong Kong University of Science & Technology Library

New Acquisitions List (4) Conclusion: By using Perl scripts and XSLT stylesheets, list of XML formatted bibliographic records extracted from INNOPAC can be transformed into two completely different outputs (views), namely HTML web page and RSS news feed. Hong Kong University of Science & Technology Library

Electronic Journals Online (1) http://library.ust.hk/res/ejournals/ Design considerations Require a database (on Tamino) Metadata schema design Indexing design Weekly updating by Perl script Decided to use Perl module of LibXML, instead of XSLT stylesheets Hong Kong University of Science & Technology Library

Electronic Journals Online (2) INNOPAC Weekly Update (by Perl and LibXML2) XML Formatted IIIRECORD EJ_RECORD EJ Online Extract elements Construct EJ_RECORD Load metadata to EJ Online Hong Kong University of Science & Technology Library

Electronic Journals Online (3) Metadata Design Decided not to use Dublin Core Internal metadata - not for exchange with external systems Programming overhead to incorporate DC Requires extension of DC in order to markup MARC Tag 856, the hypertext link to the electronic resources Hong Kong University of Science & Technology Library

Electronic Journals Online (4) Decided not to use RDF Due to the same reasons above; although it can resolve the Tag 856 markup problem that DC has. Sample abridged EJ_RECORD Hong Kong University of Science & Technology Library

Antique Maps and Theses (1) HKUST Theses http://library.ust.hk/cgi/db/thesis.pl Design considerations Both databases are on Tamino Metadata as XML documents Hypertext links to PDF files Hong Kong University of Science & Technology Library

Antique Maps and Theses (2) Multi-script Considerations Non-English characters: Diacritics Mathematical symbols and formulas Greek alphabet CJK XML is UTF-8 by default Tamino stores XML documents in Unicode Hong Kong University of Science & Technology Library

Antique Maps and Theses (3) Unicode and UTF-8 Explained: Developed by Unicode Consortium (http://www.unicode.org), since 1991. A character coding system of written texts of diverse languages. Latest version is 4.0, released in 2003. Has 96,382 characters. 82,270 of them are CJK characters (including Hangul). Hong Kong University of Science & Technology Library

Antique Maps and Theses (4) Diacritics – Combining Characters to be positioned relative to an associated base character. UTF-8 transforms a Unicode scalar value to a sequence of 8-bit bytes. English alphabets are one byte, CJK ideographs are three bytes. a ȧ Hong Kong University of Science & Technology Library

Antique Maps and Theses (5) Example of UTF-8 transformation: Latin character A has a Unicode scalar value of U+0041. It is transformed to \x41. Greek alphabet α has a Unicode scalar value of U+03B1. It is transformed to \xCE\xB1. Chinese character 中 has a Unicode scalar value of U+4E2D. It is transformed to \xE4\xB8\xAD. Hong Kong University of Science & Technology Library

Antique Maps and Theses (6) Demonstration – Entering non-Latin characters to the metadata Hong Kong University of Science & Technology Library