Presentation on theme: "Preserving for the Future Mike King Systems Manager UK Data Archive (University of Essex)"— Presentation transcript:
Preserving for the Future Mike King Systems Manager UK Data Archive (University of Essex)
Unseen hub of the Archive Depositor Services Systems and Preservation User Services Projects Research and Development Data In Data Out
Preservation in outline We currently preserve Approximately 4,500 studies of which more than 30 are generic occupying about 600GB but with capacity for more than 3TBytes on main system 236,000 files, 52,000 directories (average file size 2.71MBytes). Growing by about 100GB per year More than 32 years of electronic data preservation 23 + 7 + 2 accumulative years of experience Have (so far) not lost any data!
Consistent Directory Structure (simplified) Note and Read files Data format files (SPSS exp, SAS, SIR) mrdoc Original deposited format Machine readable document files (pdf, word, ascii) Study Number } } Processing information and control files
Standard and consistent directory structure for all preserved data Standard directory structure for complete dataset –Everything in a specific location – All studies can be treated the same –Consistent structure makes precisely locating information easy –Makes caching of specific information types simple –Allows future migration to other systems and formats easier Data and documentation stored in portable format –Ability to freely and intelligently read on many platforms –Easier conversion to required format –Easier migration to new portable format
Segregation of UKDA Servers Main Preservation Study Updates Front end Offsite Near-Site Shadow CDROM Restricted Data Download Unrestricted Documentation Web Server General User Web Access Archive Staff Database Server
Multi-copy, multi-storage media and multi version resilience Two copies on separate media in main tape library system Up to 10 different versions of each individual file in the shadow area Tape monitoring and refresh strategy Read only CD-ROM copy with error checking Dissemination copy to reduce load on main system Near-site copy Off-site copy (located at Cambridge) – Highly encrypted All with preservation metadata to confirm identical (dates, location and MD5 checksum) Backed by Anti-virus protection and tripwire detection.
Secure Environment Blanket coverage by centrally controlled Anti-virus protection Tripwire detection on main systems Firewall prevention Good practice modelled on Information Security Management standard - BS7799 Machine room conforms to British Standards including access, temperature and fire prevention
British Standards BS7799 - Information security Management Machine room conforms to main fire and environmental control standards Conforms to BS5588 parts 3 and 9, BS5839 parts 1, 2 and 3, BS5306 part 4 and BS7083, BS6266, BS4783 parts 4, 5 and 7
Recovery Disaster Prevent rather than cure but.. –Single unreadable file - (Shadowed) –Single tape in library - (Shadowed and Replicated) –Single disk – (RAID 5 fault tolerance) –Main machine - (Standby near-site server) –Disaster at Essex – (Off-site copy)
Future developments Most driven by developments in other sections of the Archive Strive towards greater ease of use but still remaining secure Technology watch – obsolete software, formats and hardware