Presentation on theme: "Richard Gartner Oxford University"— Presentation transcript:
1 Richard Gartner Oxford University METS and TEIRichard GartnerOxford University
2 Introduction (verbal) METS provides framework within which any data or metadata can be referenced or embeddedThis presentation shows how easily METS and TEI can be used in tandemThe context is an image database with full OCR’d text encoded in TEI
7 OCR -> TEITEI in Libraries level 1 – simplest level of encoding designed for OCR textsOne <div> element enclosing complete textOne <p> element within thisPage breaks marked with <pb>
8 OCR -> TEI (verbal)OCR’d text put into skeletal TEI file with minimal headerPage-breaks in file replaced with <pb>A simple stylesheet assigns a sequential ID to each <pb>Another stylesheet adds <area> elements to METS structural map pointing to <pb> elements
9 Put your OCR text here! <?xml version="1.0" encoding="utf-8"?> <tei.2><teiHeader status="new" type="text"><fileDesc><titleStmt><title>modhis006-aab OCR text</title></titleStmt><publicationStmt><publisher>Oxford Digital Library</publisher></publicationStmt><sourceDesc default="NO"><p >OCR text from modhis006-aab</p></sourceDesc></fileDesc></teiHeader><text><body><div0 id="modhis006-aab-aaa.div.1" part="N“ sample="complete" org="uniform"><p></p></div0></body></text></tei.2>Put your OCR text here!
10 □Parliamentary History. VOL. n.□<pb/>Parliamentary History.VOL. n.<pb/><pb/>Parliamentary History.VOL. n.<pb/>
14 Why use METS and TEI together? ImagesOverlapping hierarchies
15 Verbal Images Overlapping hierarchies AS far as P4, TEIs image facilities clumsyHave to use entity references only – no URLs URIs etcNo way to distinguish between inline images (designed for these) and whole-page imagesNo scope for administrative metadataOverlapping hierarchiesCONCUR was SGML mechanism for this – clumsy to use and gone in XML – various other approaches all distinguised by notational complexity
17 Overlapping hierarchies Some approaches used with TEICONCUR (SGML)MECS (Wittgenstein archive)Stand-off markup: XLink mechanisms to impose markup (varying hierarchies)TexMECSWitt: PROLOG
18 Images in METS List all variants of image files in <fileSec> Each can have extensive administrative or descriptive metadata attachedReference them by URLs, URIs etc or embed them in the METS fileFILEID element in <structMap> indicates exact correspondence of image to part of the item
Your consent to our cookies if you continue to use this website.