Presentation on theme: "GRAD 521, Research Data Management Winter 2014 – Lecture 7 Amanda L. Whitmire, Asst. Professor."— Presentation transcript:
GRAD 521, Research Data Management Winter 2014 – Lecture 7 Amanda L. Whitmire, Asst. Professor
Follow-up from last class What is a reasonable timeline for DCP? MTuWThF WEEK WEEK WEEK WEEK WEEK
Overview for today Why? Where to store data Local drive | network drive | cloud Consider: capacity & access by co-workers Data backup Disaster recovery (research continuity) Data security Corruption or loss (hardware failure or data deletion) Confidentiality (personal or intellectual property)
Why data storage, backup & security are important “Your data are the life blood of your research. If you lose your data recovery could be slow, costly or even worse… it could be impossible it could be impossible.”
Most common loss scenario: drive failure
This happens a lot: physical theft & unintentional damage Cute, but not a valid security plan.
Rare, unexpected events happen University of Southampton, School of Electronics and Computer Science, Southampton, UK, 2005
Storage: PC/laptop Advantages Convenient Disadvantages Drive failure common Laptops: susceptible to theft & unintentional damage Not replicated Bottom Line Do NOT use to store master copies of data Not a long term storage solution Back up important data & files regularly
Storage: external storage devices Advantages Convenient, cheap & portable Disadvantages Longevity not guaranteed (e.g. Zip disks) Errors writing to CD/DVD are common Easily damaged, misplaced or lost (=security risk) May not be big enough to hold all data; multiple drives needed Bottom Line Do NOT use to store master copies of data Not recommended for long-term storage
Storage: networked drives Advantages Data in single place, backed up regularly Replicated storage not vulnerable to loss due to hardware failure Secure storage minimizes risk of loss, theft, unauthorized use Available as needed (assuming network avail.) Disadvantages Cost may be prohibitive; export control Bottom Line Highly recommended for master copies of data Recommended for long-term storage (~5 years)
Storage: cloud storage Advantages Data in single place, backed up regularly Replicated storage not vulnerable to loss due to hardware failure Secure storage minimizes risk of loss, theft, unauthorized use Disadvantages Cost may be prohibitive Upload/download bottleneck & fees Longevity? Export control Bottom Line Possibly recommended for master copies of data Not recommended for in-process data, large files
Storage: Google Drive for OSU Advantages All same advantages of network & cloud storage File sharing & collaboration w/variable access levels Unlimited storage (GD), 30 GB non-GD Automatic version control on GD Disadvantages 30 GB may not be enough Upload/download bottleneck Bottom Line Possibly recommended for master copies of data Possibly not recommended for in-process data, large files
16 ? ? ? ?
Data backup “Keeping backups is probably your most important data management task.” -Everyone
Data backup Original External Local External Remote Best Practice: 3 Copies of datasets
Backups: full Advantages Data can be easily & fully restored from a recent full backup Disadvantages Time consuming Take up the most storage Bottom Line Recommended for master copies of data Frequency depends on data size & mutability
Backups: differential Advantages Data can be easily & fully restored from a full backup + 1 differential backup Disadvantages Size of each differential backup increases each time Backup window increases each time Bottom Line Frequency depends on data size & mutability
Backups: incremental Advantages Smallest file size between backups (full or incremental) Shortest backup window Disadvantages When you need to restore data, the full backup +all incremental backups are required = more difficult restore scenario Bottom Line Frequency depends on data size & mutability
Backups: bottom line Pick a strategy Be consistent Test your approach!
Data security “Data security is the means of ensuring that research data are kept safe from corruption and that access is suitably controlled.”
Data security It is important to consider the security of your data to prevent: Accidental or malicious damage/modification to data Theft of valuable data Breach of confidentiality agreements and privacy laws Premature release of data, which can void intellectual property claims Release before data have been checked for accuracy and authenticity
Data security There are different levels of security to consider for your research data: Access : This refers to the mechanisms for limiting the availability of your data Systems : This covers protecting your hardware and software systems Data Integrity : This refers to the mechanisms for ensuring that your data is not manipulated in an unauthorized way
Data security: access Limit the availability of your data: ID/Password : Step 1, for everyone really Role-based access : limited privileges/permissions to data depending on user Wireless devices : lack anti-virus software and firewalls; vulnerable to theft & theft of device Use a PIN; limit storage of sensitive data on device
Data security: systems Protect your hardware & software systems: Anti-virus software : required of all OSU computers OS & media software : keep them up to date Firewalls : block unwanted network traffic from reaching your computer or server (e.g. typical home router) Intrusion detection software : detects & alerts, does not prevent Physical access : locked office; password on wake; cable lock for laptops;
Data security: data integrity Protect the integrity of your file-level: Encryption : the process of converting data into an unreadable code. You must have access to a password or a secret encryption key to be able to read an encrypted file. Check with OSU Data Security team for advice (no “one size fits all” solution). Electronic signatures : meant to ensure the authenticity of the signer and by extension, the document; now carry legal significance Watermarking : embeds a digital marker for authorship verification and can alert someone of alterations made to data files; most often w/images & media
29 ? ? ? ?
Exercise Complete the ‘Data Storage, Backup & Security Checklist’