Document Computing Technologies for Managing Electronic Document Collections Ross Wilkinson... [et al.] Circulation Counter [RES3H] ZA4080.D63 1998.

Slides:



Advertisements
Similar presentations
CSCI N241: Fundamentals of Web Design Copyright ©2004 Department of Computer & Information Science Introducing XHTML: Module B: HTML to XHTML.
Advertisements

DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
XML/EDI Overview West Chester Electronic Commerce Resource Center (ECRC)
WMES3103 : INFORMATION RETRIEVAL
1 XML: Document Type Definitions 2 Road Map  Introduction to DTDs  What’s a DTD?  Why are they important?  What will we cover?  Our First DTD 
XML A brief introduction ---by Yongzhu Li. XML --- a brief introduction 2 CSI668 Topics in System Architecture SUNY Albany Computer Science Department.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Developing a Basic Web Page with HTML
Introducing XHTML: Module B: HTML to XHTML. Goals Understand how XHTML evolved as a language for Web delivery Understand the importance of DTDs Understand.
Copyright © 2003 Pearson Education, Inc. Slide 1-1 Created by Cheryl M. Hughes, Harvard University Extension School — Cambridge, MA The Web Wizard’s Guide.
Introduction to XML This material is based heavily on the tutorial by the same name at
Portable Document Format PDF. What is PDF? Universal file format developed by Adobe Systems Incorporates fine detail and quality of print publications.
Chapter 2 Introduction to HTML5 Internet & World Wide Web How to Program, 5/e Copyright © Pearson, Inc All Rights Reserved.
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
Portable Document Format PDF. What is PDF? Universal file format developed by Adobe Systems Incorporates fine detail and quality of print publications.
Chapter 12 Creating and Using XML Documents HTML5 AND CSS Seventh Edition.
Network publishing and mark-up languages. Alpe Adria Master Course :: Medical Informatics :: Dr. J. Dimec: Web publishing and mark-up languages.2 p- versus.
Marco Mesiti Dep. of Computer Science University of Genova XML eXtensible Markup Language.
August Chapter 1 - Introduction Learning XML by Erik T. Ray Slides were developed by Jack Davis College of Information Science and Technology Radford.
Chapter 6 Text and Multimedia Languages and Properties
1 © Netskills Quality Internet Training, University of Newcastle Metadata Explained © Netskills, Quality Internet Training.
Chapter 16 The World Wide Web. 2 The Web An infrastructure of information combined and the network software used to access it Web page A document that.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text.
CPS120: Introduction to Computer Science The World Wide Web Nell Dale John Lewis.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
Introduction to XML Eugenia Fernandez IUPUI. What is XML? From the World Wide Web Consortium (W3C) The Extensible Markup Language (XML) is the universal.
Chapter 1 Understanding the Web Design Environment Principles of Web Design, 4 th Edition.
Week 1 Understanding the Web Design Environment. 1-2 HTML: Then and Now HTML is an application of the Standard Generalized Markup Language Intended to.
Metadata Xiangming Mu. What is metadata? What is metadata? (cont’) Data about data –Any data aids in the identification, description and location of.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.
TEXT ENCODING INITIATIVE (TEI) Inf 384C Block II, Module C.
Sheet 1XML Technology in E-Commerce 2001Lecture 1 XML Technology in E-Commerce Lecture 1 WWW, HTML, CSS, XML, Meta-modeling.
FIGIS’ML Hands-on training - © FAO/FIGIS An introduction to XML Objectives : –what is XML? –XML and HTML –XML documents structure well-formedness.
XML TUTORIAL Portions from w3 schools By Dr. John Abraham.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
University of Nottingham School of Computer Science & Information Technology Introduction to XML 1. The XML Language Tim Brailsford.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
1 Introduction  Extensible Markup Language (XML) –Uses tags to describe the structure of a document –Simplifies the process of sharing information –Extensible.
XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.
XP 1 Creating an XML Document Developing an XML Document for the Jazz Warehouse XML Tutorial.
1 Introduction to XML XML stands for Extensible Markup Language. Because it is extensible, XML has been used to create a wide variety of different markup.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
XML Design Goals 1.XML must be easily usable over the Internet 2.XML must support a wide variety of applications 3.XML must be compatible with SGML 4.It.
Introduction to Markup Languages January 31, 2002.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
XML The Extensible Markup Language (XML ), which is comparable to SGML and modeled on it, describes how to describe a collection of data. A standard way.
April 20023CSG11 Electronic Commerce Markup languages John Wordsworth Department of Computer Science The University of Reading
XP 1 HTML Tutorial 1: Developing a Basic Web Page.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
SNU OOPSLA Lab. A Tour of XML © copyright 2001 SNU OOPSLA Lab.
Blended HTML and CSS Fundamentals 3 rd EDITION Tutorial 1 Using HTML to Create Web Pages.
XML BASICS and more…. What is XML? In common:  XML is a standard, simple, self-describing way of encoding both text and data so that content can be processed.
XML Related Technologies
Chapter 1 Introduction to HTML.
XML QUESTIONS AND ANSWERS
Introduction to XHTML.
Session I - Introduction
Session I - Introduction
Portable Document Format
Introducing HTML & XHTML:
Chapter 16 The World Wide Web.
CSE591: Data Mining by H. Liu
5.00 Apply procedures to organize content by using Dreamweaver. (22%)
Presentation transcript:

Document Computing Technologies for Managing Electronic Document Collections Ross Wilkinson... [et al.] Circulation Counter [RES3H] ZA4080.D

Chapter 1 Document Lifecycle

What is a document? A document records a message from people to people.

Characteristics of a document Content Structure Metadata

A message has a context, which is important for understanding the message. A document contains not only the contents of a message, but also some information about the document, e.g. author, date, recipients. We called such information the metadata about the document.

Why Document Management? It is hard to find documents. It is hard to organize documents. It is hard to control documents. Metadata helps document management.

Benefits of Document Management Location-independent delivery of documents upon demand Controlled access to documents A record of the life of a document Better re-use of documents

Chapter 2 Electronic Document Description

Document Content Simplest type of content – unformatted text Text retrieval system based on search by keywords E.g Windows Desktop Search (video)Windows Desktop Searchvideo Optical character recognition (OCR) system

Document Structure Even unformatted text has some structures, e.g. lines, words, images, etc. A document may have elaborate structures. Two levels of structures: –Logical structure –Presentational structure

Logical structures Example: TO: John D. FROM: Kate M. DATE: 7/8/98 I have finished Stage B of the design. Could you take a look at it? Simple logical structure: lines of text A logical structure of a memo: (see next slide)

A logical structure for a memo Memo Head SenderReceiverDate Body Paragraph

Presentational Structure A different presentational structure for the same memo John D., 7/8/98 I have finished Stage B of the design. Could you take a look at it? Kate M.

Presentation medium The content of the same document can be presented in different media with different presentational structures: E.g. a PDF file vs. a online Web page

Metadata Generally, we need metadata to capture: –Registration information –Usage information –Structural properties –Contextual information –Content description –Historical information

The Dublin Core metadata set Title Creator Subject Description Publisher Contributors Date Type Format: e.g. HMTL, pdf Identifier: e.g. URI Source Language Relation Coverage: duration Rights: e.g. copyright

Document Description Language (DDL) For use by document management system E.g. RTF, Postcript, SGML DDL support: –Language support, media support, transparency, structure, link support, metadata support Other DDL characteristics: –Document creation, import conversion, export transformation, update, presentation quality, presentation flexibility, etc.

Examples of DDLs ASCII (American Standard Code for Information Interchange) Unicode ASCII and Unicode offer very limited support Rich Text Format TeX and LaTeX SGML, HTML, XML Postscript, PDF

Rich Text Format (RTF) Developed by Microsoft For interchange between Microsoft Word and other software Main purposes: –Preserve information in Word (blocks of text) Example: next slide

{\rtf1\adeflang1025\ansi\ansicpg1252\uc2\adeff0\deff0\stshfdbch13\stshfloch0\stshf hich0\stshfbi0\deflang2057\deflangfe1028{\fonttbl{\f0\froman\fcharset0\fprq2{\*\pan ose }Times New Roman … {\title John D}{\author Dr. Yeung}{\operator Dr. Yeung}{\creatim\yr2008\mo3\dy18\hr15\min24}{\revtim\yr2008\mo3\dy18\hr15\mi n25}{\version1}{\edmins1}{\nofpages1}{\nofwords14}{\nofchars81}{\*\company Lingnan University}{\nofcharsws94} … \ltrch\fcs0 \insrsid \charrsid \hich\af0\dbch\af13\loch\f0 John D., 7/8/98 \par \hich\af0\dbch\af13\loch\f0 I have finished Stage B of the design. Could you take a look at it? \par \par \hich\af0\dbch\af13\loch\f0 Kate M\hich\af0\dbch\af13\loch\f0. \par }\pard \ltrpar\ql \li0\ri0\widctlpar\wrapdefault\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 {\rtlch\fcs1 \af0 \ltrch\fcs0 \insrsid \par }}

TeX and LaTeX TeX created by Donald Knuth TeX is a typesetting software. LaTeX created based on TeX by Leslie Lamport LaTeX use markup constructs to separate logical description from presentation. LaTeX example: see next slide To learn LaTeX: click.click

\documentclass{article} \usepackage{times} \pagestyle{empty} \begin{document} \title{Sample Document} \author{ W. L. Yeung\\Department of Computing and Decision Sciences\\ Lingnan University, Hong \maketitle \section{Introduction} … \section{Conclusion} … \end{document}

SGML Standard Generalized Markup Language To describe a document in SGML, we need: –An SGML declaration –A document type definition (DTD) –A document instance An SGML declaration specifies which characters are used in the DTD. Normally a default is used.

SGML (cont.) A document type definition (DTD) defines the rules for forming a class of documents, i.e. the grammar of a document class. The building blocks of SGML documents are elements. A DTD for the memo document: next slide.

DTD An element definition gives the name of the element, then the rules for building that element. Elements can contain other elements. Terminal (basic) elements often consist of parsed character data “#PCDATA” or “#CDATA”.

The memo in SGML John D Kate M 7/8/1998 I have finished Stage B of the design.

HTML Hypertext Markup Language For World Wide Web (WWW) documents Conforms to a SGML DTD HTML is presentation oriented: instructions (tags) are inserted into a document to for presentation effects The DTD for HTML is available on

The memo in HTML Memo Memo I have finished Stage B of the design.

XML Extensible Markup Language Three basic definitions: –XML for representing data and documents –XLink and XPointer for representing inter- document linking –XSL for representing presentation XML is a near-subset of SGML

XML (Cont.) Two classes of XML documents: –Valid XML documents: documents that conform to a specific supplied DTD –Well-formed documents: only satisfy a simple default grammar, without conforming to a specific DTD XML has become the cornerstone of electronic commerce as it allows businesses to exchange electronic documents according to some standard formats based on XML.

Postscript Developed by Adobe For representing documents that are to be printed (mainly on laser printers) A page description language optimized for printing text, images, graphics.

Portable Document Format (PDF) Developed by Adobe A page description language for representing text, graphics and images A PDF file contains presentation information on pages, annotations, links, fonts, etc. Support delivery of electronic documents exactly as they would appear in printed form. Not designed for editing or document format exchange.