Delivering textual and visual resources. Overview Case studies Methods for providing access Structures for delivery Full text Marked-up Image and text.

Slides:



Advertisements
Similar presentations
CSCI N241: Fundamentals of Web Design Copyright ©2004 Department of Computer & Information Science Introducing XHTML: Module B: HTML to XHTML.
Advertisements

METS Awareness Training An Introduction to METS Digital libraries – where are we now? Digitisation technology now well established and well-understood.
ContentDM Off-the-shelf delivery system Easy to implement Can also be customized to some degree Handles images and metadata using Dublin Core Has useful.
Delivering textual resources. Overview Getting the text ready – decisions & costs Structures for delivery Full text Marked-up Image and text Indexed How.
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
METS: An Introduction Structuring Digital Content.
XML/EDI Overview West Chester Electronic Commerce Resource Center (ECRC)
HTML/XML XHTML Authoring. Creating Tables  Table: An arrangement of horizontal rows and vertical columns. The intersection of a row and a column is called.
EAD in A2A Bill Stockting, Senior Editor A2A and EAD Working Group: Central Archives of Historical Records, Warsaw, 26 April 2003.
Project 1 Introduction to HTML.
© Tefko Saracevic, Rutgers University1 metadata considerations for digital libraries.
A Practical Introduction to XML in Libraries Marty Kurth NYLA October 22, 2004.
CM143 - Web Week 2 Basic HTML. Links and Image Tags.
OLC Spring Chapter Conferences Metadata, Schmetadata … Tell Me Why I Should Care? OLC Spring Chapter Conferences, 2004 Margaret.
HTML, XML, PDF Pros and Cons.
Copyright © 2003 Pearson Education, Inc. Slide 1-1 Created by Cheryl M. Hughes, Harvard University Extension School — Cambridge, MA The Web Wizard’s Guide.
Presented by Karen W. Gwynn LS – Metadata University of Alabama Prof. Steven MacCall Spring 2011.
Introduce of XML Xiaoling Song CS157A. What is XML? XML stands for EXtensible Markup Language XML stands for EXtensible Markup Language XML is a markup.
OCLC Online Computer Library Center Two Paths to Interoperable Metadata Jean Godby, Devon Smith, Eric Childress DC-2003 September 29, 2003.
Strategies for Building Successful Digital Initiatives at Small to Medium Size Institutions Rachel Frick & Andrew Rouner.
HTML 1 Introduction to HTML. 2 Objectives Describe the Internet and its associated key terms Describe the World Wide Web and its associated key terms.
Digital Encoding What’s behind E-text Resources?.
Document Delivery Formats for the Web and Legal Digital Collections Kevin Reiss June 18 th, 2004 Law Library Rutgers-Newark School of Law.
DIGITIZATION OF RARE LIBRARY MATERIALS Metadata Format Access to Digital Documents © Adolf Knoll, National Library of the Czech Republic.
Chapter 12 Creating and Using XML Documents HTML5 AND CSS Seventh Edition.
Metadata Standards and Applications 4. Metadata Syntaxes and Containers.
 Using Microsoft Expression Web you can: › Create Web pages and Web sites › Set what you site will look like as you design it › Add text, images, multimedia.
EAD: A Technical Introduction Julie Hardesty, Metadata Analyst June 3, 2014.
Chapter 16 The World Wide Web. 2 The Web An infrastructure of information combined and the network software used to access it Web page A document that.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
CPS120: Introduction to Computer Science The World Wide Web Nell Dale John Lewis.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Copyright © 2012 Accenture All Rights Reserved.Copyright © 2012 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are.
Introduction technology XSL. 04/11/2005 Script of the presentation Introduction the XSL The XSL standard Tools for edition of codes XSL Necessary resources.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
NetTech Solutions Working with Web Elements Lesson 6.
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
XHTML. Introduction to XHTML What Is XHTML? – XHTML stands for EXtensible HyperText Markup Language – XHTML is almost identical to HTML 4.01 – XHTML is.
TEXT ENCODING INITIATIVE (TEI) Inf 384C Block II, Module C.
DLI Training April 2004 Kingston Ontario. DDI What, Why, How?
A Brief Introduction to Encoded Archival Description Kevin Schlottmann Queens College Archives and Special Collections April 7, 2010.
Practical Metadata Kathryn Lybarger. What is metadata?
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Metadata: Essential Standards for Management of Digital Libraries ALI Digital Library Workshop Linda Cantara, Metadata Librarian Indiana University, Bloomington.
Introduction to HTML Tutorial 1 eXtensible Markup Language (XML)
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
Smart Qualitative Data: Methods and Community Tools for Data Mark-Up SQUAD Libby Bishop Language and Computation Day University of Essex 4 October 2005.
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
10/18/2015 NORTEL NETWORKS CONFIDENTIAL – FOR TRAINING PURPOSES ONLY Global Documentation Evolution System Overview and End-to-End Process Training.
XP 1 Creating an XML Document Developing an XML Document for the Jazz Warehouse XML Tutorial.
CEAL 2003 XML for CJK Wooseob Jeong School of Information Studies University of Wisconsin - Milwaukee.
Introduction to metadata
XML Design Goals 1.XML must be easily usable over the Internet 2.XML must support a wide variety of applications 3.XML must be compatible with SGML 4.It.
+ Information Systems and Databases 2.2 Organisation.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Metadata Metadata Mark-up and Management © Adolf Knoll, National Library of the Czech Republic.
XML A Language Presentation. Outline 1. Introduction 2. XML 2.1 Background 2.2 Structure 2.3 Advantages 3. Related Technologies 3.1 DTD 3.2 Schemas and.
Web Technologies Lecture 4 XML and XHTML. XML Extensible Markup Language Set of rules for encoding a document in a format readable – By humans, and –
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
XML The Extensible Markup Language (XML ), which is comparable to SGML and modeled on it, describes how to describe a collection of data. A standard way.
XP 1Creating Web Pages with XML Tutorial 1 New Perspectives on XML Tutorial 1 – Creating an XML Document.
A centre of expertise in digital information management UKOLN is supported by: Metadata – what, why and how Ann Chapman.
Sharing Your Finding Aids in CONTENTdm Encoded Archival Description (EAD) Files in Mountain West Digital Library June 3, 2009 Sandra McIntyre, Mountain.
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
Web Page Programming Terms. Chapter 1 Objectives Describe Internet and Understand Key terms Describe World Wide Web and its Key terms Identify types and.
Project 1 Introduction to HTML.
Microsoft Office Illustrated
Prepared for Md. Zakir Hossain Lecturer, CSE, DUET Prepared by Miton Chandra Datta
Presentation transcript:

Delivering textual and visual resources

Overview Case studies Methods for providing access Structures for delivery Full text Marked-up Image and text Indexed How to guidance for: Rekeying OCR

ContentDM Off-the-shelf delivery system Easy to implement Can also be customized to some degree Handles images and metadata using Dublin Core Has useful features such as lightbox, collections and my favorites Proprietary solution but uses strong standards Requires no plug-ins

Greenstone Open source and open standards Developed by New Zealand Digital Library Supported by UNESCO and available free from them Used by a number of digital library projects Strong metadata and image support Multilingual

Custom built solution: CVMA Corpus Vitrearum Medii Aevi Medieval Stained Glass in Great Britain Technical solution developed by the Centre for Computing in the Humanities, King’s College London Database for complex information relationships XML for documents and text Clickable maps as navigation aid

Getting the text ready - decisions Choices: Full text every character & word searchable, viewable & reusable in digital form Marked-up as above but with markup added to enable structured searches and use (e.g. XML, SGML) Image and text an image is all the viewer sees - text is fully searchable but is not seen or reusable Indexed Images/files attached to an index or catalogue

Getting the text ready - costs Full text generally expensive in time and resources but depends upon source – for born digital very cheap Marked-up Usually the most expensive due to skilled staff needed for intellectual content markup but some automated system around for format based markup Image and text comparatively cheap but some usability down sides Indexed great if index or catalogue already exists and can just link file to record (e.g. MARC)

Full text Files (e.g. PDF, Word) Formatted text (e.g. HTML) Fully searchable Reusable – copy, edit, share Very high accuracy i.e. 100% expected by user Unstructured searches Results can be overwhelming Born digital – reformatting for delivery to be considered

Markup Advantage of structured search and use Complex to create specifications and workflow from scratch Delivering requires a description of the codes, rules and documents used Most projects will adapt one that already exists: TEI – Text Encoding Initiative EAD – Encoded Archival Documents Some automation possible and some system solutions that enable this

Markup: examples Thomas Knight was indicted for the wilful murder of Robert Ball. He stood charged on the Coroner's inquest for manslaughter, September 7. Michael Ball. The deceased Robert Ball was my son; he was a clock- case maker; the prisoner and he had been fighting some time; he stood up against a wall; I said, Robert, will you fight any more? he said, yes; they fought again. I saw but little of it.

Markup Two forms commonly used: Layout and structure based (format) Thomas Knight was indicted for the wilful murder of Robert Ball. He stood charged on the Coroner's inquest for manslaughter, September 7. Michael Ball. The deceased Robert Ball was my son; he was a clock-case maker; the prisoner and he had been fighting some time; he stood up against a wall; I said, Robert, will you fight any more? he said, yes; they fought again. I saw but little of it.

Markup Content based (function) Thomas Knight was indicted for the wilful murder of Robert Ball. He stood charged on the Coroner's inquest for manslaughter, September 7. Michael Ball. The deceased Robert Ball was my son; he was a clock-case maker ; the prisoner and he had been fighting some time; he stood up against a wall; I said, Robert, will you fight any more? he said, yes; they fought again. I saw but little of it. Can obviously be combined to deliver function and format at the same time

Markup languages Markup is a language not a programming tool All use tags or elements – software interprets those tags for display purposes and/or for search and retrieval Allows users (or communities of users) to create their own tag sets Markup can encode both logical and physical features of text

Markup languages SGML Standard Generalised Markup Language (ISO in 1986) Father of all markup languages HTML Hypertext Markup Language (ISO in 1991) Markup of ‘physical’ features of articles to enable Internet sharing of content – is about format of content XML: Extensible Markup Language (ISO in 1998) SGML lite to enable generic Web use of powerful XML features – is about function of content /

XML: bits and pieces XML Content (.xml) XML Rules (.dtd) Schemas – e.g. TEI, METS DTDs = Document Type Definitions Namespaces (used when you want to combine sets of rules together in a single document)

DTD explained A DTD is the formal definition of the elements, structures, and rules for marking up a given type of XML document Think of it as an abstraction of the document structure What tags and elements must/can be used How these tags and elements are structured in relation to each other Allows Internet browsers and other software to understand how to interpret XML content

XML: further bits and pieces Entities (.ent) Reusable data inside a DTD or within markup Think of entities as variables that can be used to define common text (e.g. copyright information). You can then use the entity anywhere you would normally use the text. Display (.css &.xsl) eXtensible Style Sheet Language Cascading Style Sheets Exstensible Style Sheet Language (.xsl) Used for transforming data to another structure Used for formatting objects

Image and text Image delivered and text is fully searchable but not viewable Text usually created by uncorrected OCR Different ways to do this: Use a PDF document with image and text Deliver an image with text that has been extracted to a searchable database e.g. JSTOR Deliver an image with text that has very basic mark up (possibly just pages defined) and searched as XML

Indexed Basically just linking text or document formats to a subject index or resource catalogue Makes sense and is low cost where the index resources already exists Not so good if the index/catalogue has to be created as this part is costly – in that circumstance XML might be better Delivered as a link within the index/catalogue that directs user to the single text/document file Often used with MARC records or museum Content Management Systems