History Data Service December 1st, 2001

Slides:



Advertisements
Similar presentations
History Data Service1 Good Design for Historical source based Databases History Data Service Hamish James.
Advertisements

Section 6.2. Record data by magnetizing the binary code on the surface of a disk. Data area is reusable Allows for both sequential and direct access file.
Digital Color 24-bit Color Indexed Color Image file compression
1 Chapter 2 The Digital World. 2 Digital Data Representation.
A Digital Imaging Primer Nick Dvoracek Instructional Resources Center University of Wisconsin Oshkosh.
Motivation Application driven -- VoD, Information on Demand (WWW), education, telemedicine, videoconference, videophone Storage capacity Large capacity.
Medieval Sources, Digital Resources Mark Merry History Data Service
Multimedia for the Web: Creating Digital Excitement Multimedia Element -- Graphics.
Bits are Not just for Numbers or Characters Computers store characters as bits or binary digits. Characters from the English-language keyboard can be represented.
SWE 423: Multimedia Systems Chapter 7: Data Compression (1)
HAN Conference © History Data Service The History Data Service : Promoting Good Practice and Standards of Scholarship Cressida Chappell Head of.
1 King ABDUL AZIZ University Faculty Of Computing and Information Technology CS 454 Computer graphicsIntroduction Dr. Eng. Farag Elnagahy
Part of the Arts and Humanities Data Service and the UK Data Archive. Funded by the Joint Information Systems Committee and the Arts and Humanities Research.
Developing a Basic Web Page with HTML
Introduction to Computer Graphics
File Formats The most common image file formats, the most important for cameras, printing, scanning, and internet use, are JPG, TIF, PNG, and GIF.
Components Text Text--Processing Software A Word Processor is a software application that provides the user with the tools to create and edit text.
Digitisation Mick Eadie Visual Arts Data Service.
Nat 4/5 - Software Design and Development – Low Level Operations - 1 National 4/5 – Computing Science Information Systems Design and Development Media.
An Introduction to Scanning and Storing Photographs and Graphics Bryn Jones Aug 2002
Chapter 3 Adding Images in HTML. Agenda Understanding Web Page Images Prepare Your Images for the Web Insert an Image Specify an Image Size Add Alternative.
Peripherals and Storage Looking at: Scanners Printers Why do we need storage devices anyway? What are magnetic disks? How do magnetic disks physically.
Media File Formats Jon Ivins, DMU. Text Files n Two types n 1. Plain text (unformatted) u ASCII Character set is most common u 7 bits are used u This.
Fundamentals Rawesak Tanawongsuwan
Prepared by George Holt Digital Photography BITMAP GRAPHIC ESSENTIALS.
Digital Images The digital representation of visual information.
Introduction to Computers
1 Bitmap Graphics It is represented by a dot pattern in which each dot is called a pixel. Each pixel can be in any one of the colors available and the.
Unit 30 P1 – Hardware & Software Required For Use In Digital Graphics
CS 1308 Computer Literacy and the Internet. Creating Digital Pictures  A traditional photograph is an analog representation of an image.  Digitizing.
Higher Computing Data Representation.
Faculty of Sciences and Social Sciences HOPE Website Development Graphics Stewart Blakeway FML 213
CSCI-235 Micro-Computers in Science Hardware Part II.
 Refers to sampling the gray/color level in the picture at MXN (M number of rows and N number of columns )array of points.  Once points are sampled,
Foundations of Web Design I Photoshop CS5 Michael Daniel
3. Multimedia Systems Technology
Data Representation The method of data representation in a computer system depends upon the type of data which is being used. Three types of data are considered.
DIGITAL Video. Video Creation Video captures the real world therefore video cannot be created in the same sense that images can be created video must.
Object Orientated Data Topic 5: Multimedia Technology.
1 Ethics of Computing MONT 113G, Spring 2012 Session 10 HTML Tables Graphics on the Web.
Introduction to Interactive Media 03: The Nature of Digital Media.
Section 8.1 Create a custom theme Design a color scheme Use shared borders Section 8.2 Identify types of graphics Identify and compare graphic formats.
Information Processes and Technology Multimedia: Graphics.
Funded by: © AHDS What happens when you digitise? An introduction to some key themes Alastair Dunning Arts and Humanities Data Service
Graphics. Graphic is the important media used to show the appearance of integrative media applications. According to DBP dictionary, graphics mean drawing.
Graphics workshop Library and Information Services University of St Andrews.
Marr CollegeHigher ComputingSlide 1 Higher Computing: COMPUTER SYSTEMS Part 1: Data Representation – 6 hours.
DATABASE MANAGEMENT SYSTEMS CMAM301. Introduction to database management systems  What is Database?  What is Database Systems?  Types of Database.
Quiz # 1 Chapters 1,2, & 3.
Digital Graphics. Formats: BMP – Bitmap image file which is used to store Bitmap digital images PNG – Portable Network Graphics GIF – Graphics Interchange.
Graphics An image is made up of tiny dots called pixels (“picture elements”) The resolution determines the.
Components of a Computer System
Data Representation The storage of Text Numbers Graphics.
Marwan Al-Namari 1 Digital Representations. Bits and Bytes Devices can only be in one of two states 0 or 1, yes or no, on or off, … Bit: a unit of data.
ITGS Application Software. ITGS Application software (productivity software) –Allows the user to perform tasks to solve problems, such as creating documents,
Image File Formats. What is an Image File Format? Image file formats are standard way of organizing and storing of image files. Image files are composed.
COMP135/COMP535 Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 2 Lecture 2 – Digital Representations.
HOW SCANNERS WORK A scanner is a device that uses a light source to electronically convert an image into binary data (0s and 1s). This binary data can.
Image Editing Vocabulary Words Pioneer Library System Norman Public Library Nancy Rimassa, Trainer Thanks to Wikipedia ( help.
Introduction to Interactive Media Interactive Media Raw Materials: Digital Data.
Chapter 3 Color Objectives Identify the color systems and resolution Clarify category of colors.
Graphics and Image Data Representations 1. Q1 How images are represented in a computer system? 2.
Chapter 2 Hardware.
Software Design and Development Storing Data Part 2 Text, sound and video Computing Science.
BITMAPPED IMAGES & VECTOR DRAWN GRAPHICS
DIGITAL MEDIA FOUNDATIONS
Data Representation.
Computer Science Higher
Chapter 3 Hardware and software 1.
Chapter 3 Hardware and software 1.
Presentation transcript:

History Data Service December 1st, 2001 Designing Flexible Digital Representations of Historical Source Materials History Data Service December 1st, 2001

Digitising History Workshop - © History Data Service Programme 11:00-11:30 Arrival/Coffee 11:30-12:30 Session 1: Source to Digital Resource what is digitisation? overview of creating a digital resource 12:30-1:30 Lunch 1:30-2:30 Session 2: Digitisation Methods identifying suitable ways of digitising different kinds of sources hardware for digitisation projects data models for digitised data digital formats for digitised data limitations of digitisation projects 2:30-3:30 Session 3: Practical Considerations practicalities of undertaking a digitisation project discussion of specific software 'brands' discussion of equipment needs documentation backup and preservation of digital data 3:30-4:00 Coffee Digitising History Workshop - © History Data Service

Example: Virtual Savannah Digitising History Workshop - © History Data Service

Example: Making of America (Image) Digitising History Workshop - © History Data Service

Example: Making of America (Text) Digitising History Workshop - © History Data Service

International Dunhuang Project Digitising History Workshop - © History Data Service

Internet Library of Early Journals Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service Triangle Factory Fire Digitising History Workshop - © History Data Service

Session 1: Source to Resource

Digitising History Workshop - © History Data Service What is Digitisation? Information content can be anything about a source. Consider a page in a book - the information content includes: the text on the page the size and shape of the characters on the page the layout of text on the page the chemical composition of the paper the number of the page Digitisation: Any means of capturing the information content of a non-digital source in binary coded form The digitisation process involves separating the information content of the source from the medium which carries that information The process of digitisation creates a representation of the original source, it does not create a duplicate of the original source Information may be enhanced or damaged, discarded or added during the digitisation process Digitisation forces choices about which aspects of the source will be captured in the digital representation of the source Digitising History Workshop - © History Data Service

Source - Digitisation - Resource The ‘input channels’ of digitisation (keyboard, scanner etc.) are narrow and can only capture a small proportion of the source’s information content identify aspects of source to digitise chose data model chose digitisation method Digitising History Workshop - © History Data Service

Elements of a Digital Resource Users Knowledge Experience Culture Environment Hardware Software (OS) (Network) Digital Objects Binary Data Relationships The environment of a digital resource often receives the most attention, but it is the users and digital objects that are most important Hardware and software selection should be based on the needs of the users and the types of digital objects to be used Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service The Three Layer Model Interpretation Layer Incorporates researcher’s knowledge and judgement Links records and forms aggregates Standardisation Layer Provides a foundation for analysing the data Codes and standardisation rules are applied Source Layer An accurate digital surrogate of the source Defines level of detail captured Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service Source Analysis Simplify the source Ignore unwanted information Exclude certain types of information Define a sub-set of the remaining information content Select information directly from the source or define a set of summarised information based on the source Model the information content sub-set Break information content into discrete elements of information Describe the characteristics of each information element Describe how information elements relate to each other Successful source analysis requires a good understanding of the source and of the purpose of the digital resource Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service An Historical Source marginalia fold line page size fonts issue date columns spacing text headlines Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service Exercise Using one of the example ‘sources’ provided, identify as many pieces of the source’s information content as possible Digitising History Workshop - © History Data Service

Session 2: Digitisation Methods

Digitisation: The Information Narrows Digitisation is a process of information identification, selection, extraction and storage Digital formats are, in theory, capable of storing more information than can in practice be digitised Digitisation methods work at a low level: they handle characters, pixels and frequencies, not documents, paintings or music Digitisation involves simplifying the information content of the original so that it will ‘fit’ into a computer (storage space and processor speeds) Text: range of characters recognised Images: detail and colour Sound: intensity and pitch Moving images: detail, colour and frequency of frames 3D models: surface points sampled, colour Digitising History Workshop - © History Data Service

Digitisation Workflow Photocopy Photograph Copy of Source Transcription Digitiser tablet Digital Resource Archive Item to Digitise Digital Format processing Image Scan Digital Camera 3D Scan OCR Line tracing Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service Bits and Bytes Computers store all information as bits A bit can have the value 0, or the value 1 1 bit = 2 values (0,1) 2 bits = 4 values (00, 01, 10, 11) 4 bits = 16 values (0000, 0001, 0010 … 1101, 1111) 5 bits = 32 values 6 bits = 64 values 7 bits = 128 values 8 bits = 256 values 8 bits form a byte A kilobyte (Kb) is roughly 1,000 bytes A megabyte (Mb) is roughly 1,000,000 bytes (1 million bytes) A gigabyte (Gb) is roughly 1,000,000,000 (1 billion) bytes Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service File Compression Many forms of digitisation create very large amounts of data Data compression techniques are used to reduce the size of files created by digitisation Remove redundant information:  (length 10) can become 35 4 (length 6) Remove unnecessary detail Lossless compression reduces file size without discarding any information Lossy compression reduces file size more, but does so by discarding some information Compressed files must be decompressed before use For very large files this will impose a noticeable time delay Digitising History Workshop - © History Data Service

Common Digitisation Tools Keyboard - captures predefined symbols that are stored as machine readable text codes Scanner - captures a bitmap image of a flat sheet placed on the scanner (can be used to digitise text when combined with OCR) Digital camera - captures an image as a bitmap image instead of on film more versatile than a scanner as item does not have to placed on the camera to be digitised Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service Keyboard The keyboard is often forgotten, but it is a very effective digitisation tool Digitises precisely defined symbols Difficult to capture a large number of different symbols (only so may keys on the keyboard) QWERTY keyboard layout is inefficient (an alternative is the Dvorak keyboard layout - with Windows you can swap to Dvorak by changing a setting in the keyboard control panel) Key presses are passed to the computer and then translated into a text codes such as ASCII or UNICODE codes Digitising History Workshop - © History Data Service

Good Practice: Text Transcription Advantages: Low overhead to start transcription: person, keyboard, document Hand-written documents can be transcribed Transcriber can follow complex disorganised documents Disadvantages: Slow and expensive Human error Good practice: Double entry (two transcribers both enter the same document and the transcriptions are checked for differences) Keep copies of originals with transcriptions (preferably as digital images as this make post-transcription checking simple and quick) Digitising History Workshop - © History Data Service

Optical Character Recognition Advantages: Automatic, suitable for digitising large numbers of documents Highly accurate for clean, clear type written documents Disadvantages: Current technology is very poor on hand-writing Complex document layout can become scrambled Good practice: Proof-read OCR output for errors Provide image of page with text so users can check the text themselves Digitising History Workshop - © History Data Service

Machine Readable Text Standards A code page is a set of characters associated with the 256 codes available. With more than one code page, different characters can be associated with the same code. The first 128 characters are usually the same, differences occur in codes 128 to 255 ASCII is the most well known A lot of ‘ASCII’ text is actually stored in another standard that is essentially compatible with ASCII, Windows, DOS, Mac standards for example ASCII uses 1 byte to store each character So ASCII can represent up to 256 different characters Different languages are handled by having multiple code pages UNICODE A new text standard that uses 2 bytes to store each character, providing much more space Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service Image Digitisation Short for picture element. A bitmap image is an array of pixels aligned in rows and columns Images can be digitised using either a scanner or a digital camera The digital object created is stored in the computer as a picture made up of many pixels Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service Scanner Scanners create an image by shining light onto a page and measuring the light reflected into a line of light sensitive receptors Look for the optical resolution of a scanner, this is the number of light sensitive receptors it has and the size of steps along the page Some scanners claim higher interpolated resolutions but this is achieved by guessing the value of pixels in between the light sensitive receptors CCD components produce better images than CIS, but CIS is cheaper, lighter and smaller The quality of an image will be affected by factors other than resolution Optical components: glass is better than plastic Light source: cheap scanners may use a fluorescent light (off-white) Specialised types of scanner are available for scanning film and microfiche Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service Resolution Resolution is normally expressed as dots per inch, but the effective resolution is less than the stated scanner resolution Better ways of measuring resolution refer to the smallest discernible features in an image A 300dpi image does not mean that features 1/300th of an inch in size will be visible Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service Colour Depth Number of Bits Possible Colours Typical use 1 2 Black and white (bitonal) 4 16 16 shades of grey 8 256 256 colours or shades of grey 24 16777216 ‘true colour’ – 8 bits each for red, green and blue component of colour Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service Colour Gamut The colour gamut is the total range, or colour space, that a monitor or printer can display Monitors create colours by mixing red, green and blue (RGB) luminous phosphors Printers create colours by mixing cyan, magenta, yellow and black (CMYK) inks The colours that can be created by the two systems are different Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service Source: http://www.adobe.com/support/techguides/color/colormodels/rgbcmy.html Digitising History Workshop - © History Data Service

Uncompressed Image Size Calculate area of original in units Calculate number of pixels in a unit area Multiply area by pixels Multiply result by colour depth (in bits) Result is in bits, divide by 8 to get bytes [ (original width x original height) x resolution2 ] x colour depth Digitising History Workshop - © History Data Service

Some Uncompressed Image Sizes About 130 times the size of the complete plays of Shakespeare! Width Height Resolution Colour Depth Size 1 300 11 Kb 8 90 Kb 24 270 Kb 600 45 Kb 360 Kb 1 Mb 10 9 Mb 72 Mb 216 Mb 360 Mb 288 Mb 864 Mb Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service Digital Camera Digital cameras capture an image on an array of light sensitive receptors (pixels) Each pixel captures brightness only: colour is captured by using coloured filters to sample the brightness of light in each of the three primary colours (red, green, blue) separately The full colour of each pixel is calculated by averaging the various red, green and blue intensities of nearby pixels green filtered pixels are twice as common as blue or red because the human eye is more sensitive to green Resolution is often quoted in ‘megapixels’ (horizontal number of pixels x vertical number of pixels) Images are stored in a compressed form Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service Image Compression JPEG: lossy compression suitable for preview images and web delivery JPEG reduces file size by discarding visual information that the human eye does not register Because information is discarded, you cannot recreate the original uncompressed digital image from a JPEG image file Lossless image formats also exist, TIF and PNG are two good choices Avoid using proprietary extensions to the TIF format Digitising History Workshop - © History Data Service

Scanning Images: Good Practice Advantages: Accurate visual representation of the source Disadvantages: Text and logical structure of a document is not captured (can be through OCR) Good practice: Scan master images at highest appropriate resolution and bit depth Create derivative images from the master image for specific purposes Record details of scanner settings and any image editing done afterwards Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service Capturing Audio NYQUIST rate -- For lossless digitisation, the sampling rate should be at least twice the maximum audio frequency Human hearing Frequency (pitch) - 20Khz to 20,000Khz Intensity (loudness) - 0 and 120Db Full sound reproduction requires digitisation at more than 40,000 samples a second Good quality sound is often digitised at 44,100 samples a second In acoustic sound, the range of human hearing is from 0 Hz to roughly 20 KHz Digitising History Workshop - © History Data Service

Audio Data Sampling Rates Each sample is 8 bits in size, 16 bits for stereo sound 44,100 samples/second x 16 bits per sample x 2 channels = 1,411,200 bits per second! One second of good quality uncompressed digital sound is equivalent to ¼ of the Complete plays of Shakespeare Good compression is vital Source Sample Rate (per second) Sample Size (bits) Data Rate (Kb per second) Telephone 8.00 8 8.0 FM radio 22.05 16 88.2 CD 44.10 176.4 DAT 48.00 192.0 DVD 192.00 24 576.0 Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service Audio Compression Similar to lossy image compression There are certain sounds that the human ear cannot hear There are certain sounds that the human ear hears much better than others If there are two sounds playing simultaneously, we hear the louder one but cannot hear the softer one MP3 uses knowledge of human hearing and sound comprehension to discard the least important information For example, quiet sounds playing behind loud sounds can be discarded Digitising History Workshop - © History Data Service

Digitising Audio: Good Practice Advantages: Flexibility Disadvantages: Working with uncompressed audio takes lots of disk space Good practice: Avoid switching between digital and analogue Keep editing to a minimum to avoid creating unintended sound artefacts in the audio Digitising History Workshop - © History Data Service

Capturing Moving Images Moving images are created by playing a series of still images in rapid sequence Digitisation of moving images is similar to digitising still images, except for compression techniques 1 second of uncompressed good quality digital video (without sound) is equivalent to about ¾ of the complete plays of Shakespeare Different standards (width and height in pixels, number of frames) for television broadcast (PAL and NTSC) affect the size of digitised frames Digitising History Workshop - © History Data Service

Moving Image Compression Moving images are a sequence of still images, so look for information that is repeated between frames Start with a key frame and then just store the differences between the key frame and subsequent frames Digitising History Workshop - © History Data Service

Other Digitisation Tools Digitiser tablet - captures a vector image of a sheet of the original document An active tablet that is connected to a puck or pen and captures a series of (x,y) coordinate positions: used for digitising vector based images like maps, plans and charts 3D scanning samples the three dimensional surface of an object Contact probes or laser range finding are common methods used to determine the position in space of a point on the surface of an object See http://graphics.stanford.edu/projects/mich/ for an interesting example Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service Data Models Data models are abstract ways of structuring information File formats are specific ways of implementing a data format A Word97 document and an WordPerfect 8.0 document are different file formats that both implement similar data models of ‘a document’ The information content of a source can usually be represented by a number of different data models The source and the intended purpose of the digital resource should determine the most appropriate data model to use Once a data model has been selected it should be possible to store data in a number of file formats as required To be useful, digital data must be: Organised according to an appropriate data model Stored in a file format that can represent the data model Used in an application that understands the data model and file format in the desired way (try opening an HTML file in a web browser and an ordinary text editor, notice the difference) Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service Database Data types are different categories of information. They are used to help manage the storage and interpretation of data in a database. A database typically has data types for characters, long texts, numbers, dates and time and BLOBs (Binary Large Objects) Databases organise information as discrete chunks of data, each with a particular data type the original structure of the source is destroyed as information is extracted and reorganised to fit into the database’s data model Databases are a mature technology well supported by software applications flat file databases store a single table of information relational databases can link information in different tables together historians may also be interested in object and object-relational databases, as these enable closer modelling of the content and structure of the original source The relational data model is the most common approach to organising information in a database Digitising History Workshop - © History Data Service

Database Table Structure The field is the basic unit of data in a database. A field stores a single piece of information of a particular data type. Fields are combined to form records. A set of records with the same fields are collected together in a table. The order of fields in a record and the order of records in a table have no significance Digitising History Workshop - © History Data Service

Entity Relationship Modelling A data modelling technique that transforms information into a form that meets the requirements of the relational data model Entities are the things that the database will contain a representation of Entities can be anything: people, places, events, physical objects, or concepts All the entities with the same characteristics can be collectively called an entity type Relationships describe the way entities are connected to each other Digitising History Workshop - © History Data Service

Database Relationships One to one relationships connect one entity to one other entity One to many relationships connect one entity to one or more other entities Many to many relationships connect many entities to many other entities Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service Database Example Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service Mark-up Mark-up organises and structures data by inserting tags into a text document If the tags are ignored the flow of the original document is unaltered Mark-up can be used for different purposes: Presentation mark-up describes the layout of a document: italics, margins, bold, fonts etc Structural mark-up is used to make the logical structure of a document explicit: chapter headings, paragraphs, captions, lists etc The main languages used to define mark-up are SGML (Standard Generalised Mark-up Language) and XML (eXtensible Mark-up Language) SGML and XML include Document Type Definitions (DTD). A DTD defines a set of tags and the relationships between them that can be used to mark-up a particular type of document HTML is the SGML DTD used to define the structure of web pages Digitising History Workshop - © History Data Service

Document Type Definition (DTD) Elements are the basic unit of text in mark-up The DTD defines valid elements and how they relate to each other Elements are usually indicated by a start tag and an end tag <anelement>...text...</anelement> Elements are identified by name, but their meaning and how they should be used is not part of the DTD Elements can be related to each other in a number of ways Elements can occur within other elements Elements can contain other elements The optionality, order and other aspects of document structure are also defined in the DTD A DTD can be stored with each file, or a single DTD can be referred to from many files (using a URL, as is the case for HTML) Digitising History Workshop - © History Data Service

Example: Simple Text Transcription Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service Example: Mark-up Text … </DIV1> <DIV1 N="WBT" TYPE="tale"> <HEAD>The Wife of Bath's Tale</HEAD> <L N="857">In th' olde dayes of the kyng arthour, </L> <L N="858">Of which that britons speken greet honour, </L> <L N="859">Al was this land fulfild of fayerye. </L> <L N="860">The elf-queene, with hir joly compaignye, </L> <L N="861">Daunced ful ofte in many a grene mede. </L> <L N="862">This was the olde opinion, as I rede; </L> <L N="863">I speke of manye hundred yeres ago. </L> <L N="864">But now kan no man se none elves mo, </L> <L N="865">For now the grete charitee and prayers </L> <L N="876">And seyth his matyns and his hooly thynges </L> <L N="877">As he gooth in his lymytacioun. </L> <L N="878">Wommen may go now saufly up and doun. </L> <L N="879">In every bussh or under every tree </L> <L N="880">Ther is noon oother incubus but he, </L> <L N="881">And he ne wol doon hem but dishonour. </L> <L N="882">And so bifel it that this kyng arthour </L> <L N="883">Hadde in his hous a lusty bacheler, </L> <L N="884">That on a day cam ridynge fro ryver; <PB N="85"> </L> Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service GIS GIS (Geographic Information System) organises data spatially Different themes of data can be layered one on top of the other A variety of spatial overlay techniques can be used to combine the information GIS can use either a raster or a vector data model A raster is similar to a bitmap image GIS is often used in association with a database Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service Vector Data Formats Vector data is based on points - (x,y) or (x,y,z) - and connections between points A point represents an exact location in two or three dimensional space Two points define a line A series of connected lines define an area Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service Example: Map Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service Overlapping Models Often one data model can be represented using another data model The elements of a mark-up document can be stored as fields in a database A database can be stored using a mark-up DTD SVG (Scalable Vector Graphics) is a mark-up DTD for storing vector based images There are usually many file formats that can be used to represent a single data model: selecting the right data model is much more important then selecting a particular file format Choice of file format will follow from choice of software that suits your requirements Digitising History Workshop - © History Data Service

Digitisation: A Balancing Act Successful digitisation involves several trade-offs: Amount and detail versus time and cost of digitisation Complexity of the digital resource versus ease of use and understanding Flexibility of the digital resource versus suitability for a specific use Feasibility of digitisation with current technology versus future possibilities for digitisation Choices of what to digitise and how to digitise should be guided by a firm understanding of the source and the intended purpose of the digital resource Do not exceed the limits of available support (financial, technical, equipment, labour) Always try to preserve the information content of the source Keep information that tracks the origin and history of the digital resource with the digital resource Digitising History Workshop - © History Data Service

Session 3: Practical Considerations

Digitising History Workshop - © History Data Service Project Management A large team can be involved in a digital resource creation project and good project management is vital in ensuring that all their work is coordinated and delivered on-time Clear management structure Clear allocation of responsibilities Detailed timetable based on realistic estimates of time needed for each task Procedure for tracking progress and dealing with problems Identifying all the interdependencies between tasks in your project allows you to plot the critical path of the project. The critical path is the sequence of tasks that must all be completed on-time for the entire project to be completed on-time. A delay to any task on the critical path will delay later tasks, and the entire project will fall behind schedule. Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service Documentation The maintenance of comprehensive documentation detailing the resource creation process and the steps taken involves a significant but profitable investment of time and resources It is more effective if documentation is generated during rather than after a resource creation project Such an approach will result in a better quality digital resource, as well as better quality documentation, because the maintenance of proper documentation demands consistency and attention to detail The process of documenting a resource creation project can also have the benefit of helping to refine research questions and it can be a vital aid to communication in larger projects Digitising History Workshop - © History Data Service

Why is Good Documentation Important? The process of documenting a resource creation project can also have the benefit of helping to refine research questions and it can be a vital aid to communication in larger projects Good documentation is crucial to a digital resource’s long-term vitality Without it the resource will not be suitable for future use and its provenance will be lost Proper documentation contributes substantially to a digital resource's scholarly value Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service What Should it Include? At a minimum, documentation should provide information about a digital resource’s: Contents Provenance Who created the digital resource and why? How was the digital resource created? Which sources were used to create the digital resource? Structure Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service What Should it Include? It needs to be sufficiently detailed to allow the creator to use the resource in the future when the creation process has started to fade from memory It also needs to be comprehensive enough to enable others to explore the resource fully, and detailed enough to allow someone who has not been involved in the creation process to understand the resource and the process by which it was created Digitising History Workshop - © History Data Service

Selecting Software & Hardware Remember that there is nearly always more than one way of doing something with a computer Define what you need to do, then seek technical advice Seek a second opinion! Technical support staff will often suggest what is most convenient for them, not necessarily you Commercial companies obviously have their own motives Look for software that supports common standards Avoid little-used software with proprietary features Search for freeware and shareware utilities and tools on the web Recognise that hardware may need to be replaced in 2 or 3 years Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service Backup Everybody working with computers should backup their data Make multiple backup copies Make a backup in a different file format Don’t rely entirely on your organisation’s backup policy Use more than one type of media Don’t reuse media too many times! Store at least one backup off-site, and at least one backup on-site Check that your backup works! Digitising History Workshop - © History Data Service

Digitising History Workshop - © History Data Service Digital Preservation To remain usable, digital resources must have suitable hardware and software Digital technology develops so quickly that resources can become unusable within a few years if they are not actively preserved The digital resource should be hardware and software independent to ensure that it remains usable Use neutral data formats. These are formats that are widely accepted, are not controlled by a single organisation, and have a publicly available definition Use commonly accepted formats in preference to specialist formats. Avoid relying on special features of particular software or hardware that cannot be adequately replicated in other settings Digitising History Workshop - © History Data Service

Flexible Digital Resources A flexible digital resource is one that can support many different uses Preservation - an accurate representation of the source material Research - codes, indexes, categorisation Access - many different users in different settings Who will use the resource? How will they use it? Are there discipline specific methods of describing, categorising, and coding information that should be used? Can the resource be searched at appropriate levels of detail? The digital resource should be well documented The digital resource should adhere to standards and avoid reliance on unusual features of software or hardware Digitising History Workshop - © History Data Service