Presentation on theme: "History Data Service1 Good Design for Historical source based Databases History Data Service Hamish James."— Presentation transcript:
History Data Service1 Good Design for Historical source based Databases History Data Service Hamish James
History Data Service2 Databases A database is a computerised record keeping system. A DataBase Management System (DBMS) is a computer application built around a database that provides a flexible way of storing, manipulating, and examining data. –A DBMS consists of data, hardware, software, and users A DBMS on a personal computer will provide facilities for: –inputting data, modifying, retrieving and deleting data –querying the data (SQL) –producing reports based on the data –building front-ends for users
History Data Service3 Data Models Data models are abstract definitions of structures and relationships used to organise data in a database. Data models can be characterised by how they organise the connections between different records: –flat file –hierarchical –network –relational –object orientated Most DBMSs available for personal computers are either flat file or relational.
History Data Service4 Entity Relationship Modelling A data modelling technique that transforms information into a form that meets the requirements of the relational data model. Entities are the things that the database will contain a representation of. –Entities can be anything; people, places, events, physical objects, or concepts. –All the entities with the same characteristics can be collectively called an entity type. Relationships describe the way entities are connected to each other.
History Data Service5 Relationships one to one relationships connect one entity to one other entity. one to many relationships connect one entity to one or more other entities. many to many relationships connect many entities to many other entities.
History Data Service6 Data The field is the basic unit of data in a database. A field stores a single piece of information of a particular data type. Fields are combined to form records. A record matches an entity. A set of records with the same fields are collected together in a table
History Data Service7 Historical Uses for a Database To store and organise large amounts of information automatically. To provide easy access to the information contained in the original source. An environment for manipulating (changing and adjusting) the source. To search/filter/summarise complex information quickly.
History Data Service8 Historical Database Example
History Data Service9 Historical Databases Technical decisions are often the least important. Historians work with information they do not control. –incomplete, poorly structured information of varying quality. A historical source based database is a representation of the primary source, but it is not an exact replica of the primary resource. –Some information may be left out. –some extra information may be included. A historical source based database mixes elements of a primary source with elements of a secondary source.
History Data Service10 The Three Layer Model Standardisation Layer provides a foundation for analysing the data. codes and standardisation rules are applied. Source Layer an accurate digital representation of the source. defines level of detail captured. Interpretation Layer incorporates researchers knowledge and judgement. Links records and forms aggregates.
History Data Service11 Three Layer Design Examples
History Data Service12 Simple Design Hints Make sure the smallest unit of data matches the smallest unit of analysis. –If you want to look at people by last name then have separate first and last name fields, not just a name field. Dont mix data types –separate numbers and words. Document everything you, either in the database or with the database. –Data entry, data standardisation and coding, data transformations, limits of data etc. –Keep information that tracks the origin and history of the database. Add information, dont delete information.
History Data Service13 Further Information Starting Out Michael J. Hernandez, Database Design for Mere Mortals : A Hands-On Guide to Relational Database Design, Addison-Wesley, Database Central, History Data Service, The Classics Charles Harvey & Jon Press, Databases in Historical Research, Macmillan Press, C. J. Date, An Introduction to Database Systems, Addison-Wesley, 1999 (7th ed.)
History Data Service14 Source Layer Acts as the reference version of the original source. –An accurate representation of the source, including errors, omissions etc. –Contents determine the highest level of detail available about the source in the database. –Includes a reference to the non-digital original source. –Includes a unique identifier for each item. Implementation: –as long text fields containing full text transcriptions. –as blob fields containing scanned images. –as a regular database table. –as a pivoted database table.
History Data Service15 Standardisation Layer Organises the information into discrete units with fully defined contents. –Separates information in the source into separate fields according to data type and data content. –Simplifies the data by standardising and coding it. –Normalises the data. –Includes links back to the source layer. Implementation: –Possibly as addition columns in source layer tables. –Probably as separate tables with, ideally, a one-to-one relationship to records in the source layer. A series of rules that are applied to data to ensure that it conforms to the relational data model: 1remove repeating groups (first normal form). 2remove partial dependencies (second normal form). 3remove indirect dependencies (third normal form).
History Data Service16 Interpretation Layer Creates historical entities from the data and the knowledge and expertise of the historian. –Incorporates interpolations and extrapolations from the data in the standardisation layer. –Selectively includes and excludes information from the standardisation layer. –Links separate records to form entities such as individuals or households. –Many-to-many relationship with records in the standardisation layer. Many-to-many relationships are usually converted into two one-to-many relationships to remove data redundancy.