Presentation is loading. Please wait.

Presentation is loading. Please wait.

TEXT ENCODING INITIATIVE (TEI) Inf 384C Block II, Module C.

Similar presentations


Presentation on theme: "TEXT ENCODING INITIATIVE (TEI) Inf 384C Block II, Module C."— Presentation transcript:

1

2 TEXT ENCODING INITIATIVE (TEI) Inf 384C Block II, Module C

3 TEI History The developing organizations first met in 1987 –Association for Computers and the Humanities (ACH) –Association for Computational Linguistics (ACL) –Association for Literary and Linguistic Computing (ALLC) 1990—first Version TEI P1 1992—TEI P2 1993—TEI P3

4 TEI History Continued Principles for the development of TEI –Standard format for data interchange in humanities research –Guidelines for encoding texts in the same format –Define a recommended syntax –Define a meta language for description of text-encoding schemes Future Developments –Linguistic description and grammatical annotation –Historical analysis and interpretation –Base tag sets for further document types –Manuscript analysis and physical description of text

5 General Introduction to SGML and XML

6 The Evolution of SGML and XML 1960’ Generalized Markup Language by IBM 1960’s 1970’s & 1980’s ANSI initiates project to develop a Standard text- description language based on GML 1983 SGML became an industry standard 1986 ISO ratified a standards for SGML 1990’s Tim Berners-Lee developed HTML a simple formatting markup language for the World Wide Web Mid 1990’s XML was developed by the W3C to combine the flexibility of SGML and the simplicity of HTML

7 Benefits of SGML and XML SGML is a toolkit for developing specialized markup languages –Specifies the structure of information –Enables interoperability between multiple platforms –Acts like a database –ail encompassing The DTD acts as a blueprint for document structure XML provides a manageable framework in which you can define your own elements

8 XML Syntax Information content must have start and end tags –Case is significant –Elements may not overlap –Elements can nest one inside another

9 The XML Environment XML Editor XML Parser/Validator Display program DTD or schema to define elements Style sheet for display of elements

10 The XML Document Document prologue –XML declaration –Document type declaration Points to root element Points to external standards (DTDs, namespaces) Document itself –Bracketed by root element –Contains elements, attributes, entities

11 The Document Type Definition

12 The DTD Document Type Definition DTD defines a document’s structure i.e. it is a set of rules and declarations that specify what tags can be used and what these tags can contain DTD validates documents - determines which documents conform to language - reduces possibility of errors DTD provides blueprint for documents - specifies how to handle elements - specifies which elements are allowed

13 The DTD Document Type Definition The DTD has four main functions : 1. declares a set of allowed elements “vocabulary” 2. defines content model for each element “grammar” 3. declares set of allowed attributes for each element 4. provide various mechanisms to make management of model easier (Ray, Chapter 5, p 148)

14 Basic Structure of DTD -Element Declaration- Holds two functions: 1.Adds a new element 2.States what can go inside the element For every element that appears in the document, one must be identified in the DTD Order of declarations is important

15   “vocabulary” Denotes NAME of element that appears in mark-up tag (case-sensitive-LOWER) e.g. title, graphic, article, thingie “grammar” Formula that delineates what kind of content, how many and in what order 1.Empty elements: EMPTY 2.No content restrictions (little value): ALL 3.Only character data, no elements: #PCDATA 4.Only elements: formula 5.Mixed Content: content model

16 Basic Structure of a DTD -Attribute Declaration- <!attlist name (attname1 atttype1 attdescl1) (attname2 atttype2 attdescl2)> For each element that appears in document, attributes of the element must be declared All attributes are declared in one place, attribute list

17   “vocabulary” Name of element to which the attributes belong Same as name as element declared earlier e.g. title, article, thingie “Attribute declarations” attname1 Gives attribute name atttype1 Specifies datatype of attribute, list of values CDATA, NMTOKEN, ID attdesc1 Describes behavior 1. default value “high” 2. author specified value #REQUIRED, #FIXED, #IMPLIED

18 The DTD Document Type Definition “It is important to remember that every document type definition is an interpretation of a text. There is no single DTD which encompasses any kind of absolute truth about a text, although it may be convenient to privilege some DTDs above others for particular types of analysis.” TEI Guidelines for Electronic Text Encoding and Interchange http://etext.virginia.edu/TEI.html

19 The TEI DTD Uses basic structural elements of general DTD Designed to simplify the task of choosing an appropriate set of tags for the text in hand. Selects appropriate combination of smaller tag sets, each containing some set of tags likely to be used together 1. core tag sets – standard components that are always included, no encoder action 2. basic tag sets – basic building blocks for text types, encoder must select at least one 3. additional tag sets – extra tags compatible with all other tag sets, encoder may add with basic tags in any combination http://www.tei-c.org/P4X/DTD/

20 The TEI Header

21 Basic Elements of TEI Paragraphs Punctuation, Quotations or Lists, etc. Bibliographic Citations THE HEADER!

22 The TEI Header Required of every TEI text, composed of four parts May be large and complex or very simple The header may differ for documents not based on written text, such as computer files or spoken text The header is not a library cataloging record, although the intent is similar

23 Four Parts File Description Encoding Description Text Profile Revision Description

24 File Description

25 Encoding Description

26 Profile Description

27 Revision Description

28 Examples and Application

29 Dumble Geological Survey –A Geological survey of Texas from the late 19th Century comprised of twelve volumes Digitally imaged monographs processed with OCR software to produce text Text marked up in XML using the TEI Lite specifications http://www.lib.utexas.edu/books/dumble/

30 Dumble DTD Element and Attribute definitions Entity references

31

32

33 Dumble Header Four basic sections –File description –Encoding description –Profile description –Revision description Contains bibliographic information Contains information on the creation of the digital file

34

35

36

37 Why XML? Ability to record information about a document within the document. Ability to separate structure from format Ability to “wrap” or embed information in layers of xml

38 XML Beyond TEI Open Archives Initiative (OAI) Semantic Web Open Archival Information System Digital Preservation Information Discovery

39 References A Sample TEI Markup Appendix A.2 Elements in TEI Lite OAI OAIS Learning XML www.tei-c.org/Lite/U5-eg.html www.tei-c.org/Lite/U5-taglist.html www.openarchives.org/ http://www.rlg.org/longterm/oais.html Erik T. Ray


Download ppt "TEXT ENCODING INITIATIVE (TEI) Inf 384C Block II, Module C."

Similar presentations


Ads by Google