Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Storage Best Practices

Similar presentations


Presentation on theme: "Data Storage Best Practices"— Presentation transcript:

1 Data Storage Best Practices
Bill Corey Research Data Management Librarian UVa Library Research Data Services and Sciences

2 Welcome! Introduce yourself, and tell us what you would like to get out of this workshop.

3 Introduction Data Storage is a critical part of every research project. This workshop will explore digital data storage best practices, storage options available from UVa through ITS, options from 3rd parties, data sharing, and data security requirements. We will look at UVaBox, Box, Dropbox, Google Docs, Google Drive, Amazon AWS, and Microsoft OneDrive, SharePoint, and Azure. Let us start with what data is/are, followed by a look at the history of data storage. We will visit a few websites that show the timeline for several different data types.

4 What is/are Data? data noun, plural in form but singular or plural in construction, often attributive da·​ta | \ˈdā-tə, ˈda- also ˈdä- \ Definition of data 1 : factual information (such as measurements or statistics) used as a basis for reasoning, discussion, or calculation 2 : information in digital form that can be transmitted or processed 3 : information output by a sensing device or organ that includes both useful and irrelevant or redundant information and must be processed to be meaningful Defining “data” is simple in a dictionary. Unfortunately, if I asked everyone in this room I’d probably get 10 different answers. How we define data is often determined by our discipline and our experience. We talk about active data, and raw data, and research data, and shared data, and your data and my data. From the storage perspective, we should talk about what kind of data we want to store.

5 Data Types Data generally fall into 5 categories:
Observational: Captured in real-time. Cannot be reproduced or recaptured. Sometimes called 'unique data'. Example include sensor data, human observation, and survey results. Experimental: Data from lab equipment and under controlled conditions. Usually reproducible, but expensive to do so. Examples include gene sequences, chromatograms, spectroscopy. Simulation: Data generated from test models studying actual or theoretical systems. Models and metadata where the input may be of greater importance than the output. Examples include climate models, economic models, systems engineering. Derived or Compiled: The results of data analysis, or aggregated from multiple sources. Reproducible, but very expensive. Examples include text and data mining, compiled databases, 3D models. Reference or Canonical: Fixed or organic collection datasets, usually peer-reviewed, and often published and curated. Examples include gene sequence databanks, census data, chemical structures. You probably work with several types of data. These are 5 commonly used descriptors of research data types. But this doesn’t really help us very much as we start to explore data storage options. We need to know what the format of the data we want to store are.

6 Data File Formats A file format is the structure of how information is stored in a digital file. There are literally hundreds of file formats. Many are obsolete or proprietary. You want to use ones you are comfortable with, that are required by a specific instrument, process, or software program. Non-proprietary, standards-based, and open formats are preferred for data sharing and data preservation. Unencrypted and uncompressed formats are recommended for images, video, and audio files. Proprietary formats are sometimes required to preserve the richness of data. When you are working with data on a project it makes sense to use the formats that work best with the software and instruments you need. Therefore, Active storage should be format agnostic. When you have completed using that data file, you should think about the correct format for long-term storage. Format obsolescence is a very real and serious problem. Data migration to newer formats and fresh media should be part of your data storage strategy. Plan for the future with any data storage activity. You may need your data in one format to analyze it, and another to visualize it. You may want to share it with a collaborator who needs it in a specific format. Perhaps the software tool you need to use requires a format that only it has. Think about how software and hardware change over time. Proprietary software often has a short “shelf life”. Microsoft has changed Word quite a few times over the years. If you try to open an old 1989 copy of a Word file with the 2016 version, you will get nothing but gibberish. (The first Word program was released in It was in MS-DOS). Later versions are not always backwards compatible. The same holds true for many proprietary software products. Hardware also changes over time. Operating systems change, hard drive file formats change. The mechanical bits of computers change and break down. The data may be still on an older hard drive, but if the software that reads it is no longer available, or the bearings that spin the disc are frozen, you will never be able to access that data. Access for your collaborators is something you need to be aware of. If they are at UVa, they are probably using the same systems as you, or the data can be migrated to a format you can both access and use. But if they are in Europe, or Asia, they may not have access to the same equipment or software. This is especially important if you are working with highly sensitive data, such as personal identifiers or HIPPA regulated data. Do your collaborators have access to your data storage environments? Data Formats will be where we begin our short history lesson.

7 Recommended Digital Data Formats
Text, Documentation, Scripts: XML, PDF/A, HTML, Plain Text. Still Image: TIFF, JPEG 2000, PNG, JPEG/JFIF, DNG (digital negative), BMP, GIF. Geospatial: Shapefile (SHP, DBF, SHX), GeoTIFF, NetCDF. Graphic Image: raster formats: TIFF, JPEG2000, PNG, JPEG/JFIF, DNG, BMP, GIF. Graphic Image: vector formats: Scalable vector graphics, AutoCAD Drawing Interchange Format, Encapsulated Postscripts, Shape files. Graphic Image: cartographic: Most complete data, GeoTIFF, GeoPDF, GeoJPEG2000, Shapefile. Audio: WAVE, AIFF, MP3, MXF, FLAC. Video: MOV, MPEG-4, AVI, MXF. Database: XML, CSV, TAB. These are from the Library of Congress Recommended Formats Statement:

8 History of Data Storage Devices
There are a lot of infographics and websites that cover the ‘history of data formats’. The problem is that they pretty much focus on one type of data. This infographic is from Frontier Internet. They’ve pulled just 14 data formats, starting in

9 History of Data Storage Devices
The Computer History Museum’s Memory & Storage exhibition covers a lot of history. It isn’t a timeline, but each tab provided information about the systems and technology. It is worth a visit when you have the time.

10 History of Data Storage Devices
The Voiden Blog’s A Chronology of Computer History is a fascinating trip along the timeline of computer history. It runs from 3000 BC through 1994.

11 Data Format Timeline The Museum of Obsolete Media has the most extensive directory of data formats. And they are all obsolete. Their collection runs from the early 1800’s to Now we’ll look at a couple of other data formats – audio and video.

12 Audio Format Timeline The Museum of Obsolete Media has an extensive list of obsolete audio formats. The Audio Engineering Society has a nice timeline of audio formats to

13 Audio Formats FileInfo.com has a list that includes all know audio formats. There are literally 100’s. Most proprietary.

14 Audio Formats for Music
The Vinyl Factory has a timeline for audio formats for music. Remember 8 track tapes, or Betamax, DAT, and Blue-ray?

15 Video Formats The Museum of Obsolete Media has an extensive list of obsolete video formats. VideOlson lists over 60 video formats from 1956 to

16 Film Formats Lets not forget film either. The Museum of Obsolete has a list of film formats from 17th century to Next we’ll dive into digital data media.

17 Digital Data & Digital Media
Digital Data is stored on three types of media Magnetic: Magnetic disks include the hard drive on your laptop, external hard drives, network environments and servers, and magnetic tape (reel-to-reel and cartridge). Optical: Optical media include Compact Disks (CD, CD-ROM, CD-R, CD-RW), Digital Versatile Disks (DVD, DVD+R, DVD-R, DVD-RAM, DVD+RW, DVD-RW), Write-Once, Read-Many (WORM) disks, High-definition DVD (Blu-ray & HD-DVD), Smart Cards, and Optical tape. Solid State: Flash memory cards, USB Flash drives, and Solid State Hard Drives. Digital data is stored on digital media. Digital media can be divided into three main types: Magnetic. On magnetic media, the digital data is encoded as microscopic magnetized needles on the surface of the medium (e.g., tape). Optical. On optical media, the digital data is encoded by creating microscopic holes in the surface of the medium (e.g., CD or DVD). Solid State.  Containing no moving parts, solid state media encode digital data by applying small voltages of electricity that temporarily induced a group of transistors either on or off.  (e.g., flash memory cards, USB removable media).  Based on the characteristics of the different types of media, access to the information is divided into two categories, sequential and random access.  Sequential. Sequential access requires the user to access specific information by accessing the preceding information on the medium. For example, if you want to view a specific portion of magnetic tape, you must first fast-forward through the preceding portion of the tape. Random. Some digital media allow users to access the stored information from any physical place on the media. For example, when you connect a flash drive to your computer or insert a DVD, you can access any single file stored on the media without having to first access all the files that precede it.

18 Digital Media Lifespan
What is the lifespan of different types of digital media? It depends on many factors: Original Quality Storage conditions: temperature, humidity, air, moisture, light. Age and Handling Frequency of access Manufacturers will tell you that a HDD will last 30 years. Real world usage shows only 3-5 years. Flash storage should last 5-10 years, but high usage (read-write cycles) will shorten that considerably. CD’s and DVD’s have a shelf life of 5-10 years unrecorded, but only 2-5 years recorded. Manufacturers said they would last 100 years! Magnetic tape will last years under ideal conditions (stable temperature and humidity). The other part of the problem: media obsolescence. Can you access the older media with current technology? Do you have access to the older technology? All digital media has finite life spans which are dependent on a number of factors, including manufacturing quality, age and condition before recording, handling and maintenance, frequency of access, and storage conditions.  Studies have indicated that under optimal conditions, the life expectancy of magnetic media ranges from 10 to 20 years for different types, while optical media may last as long as 30 years.  However, in real life situations, most media life expectancies are significantly less. Some of your old cassettes and 8-tracks will probably still play (if you can find a player), but the connecting tape will have rotted or dried up.

19 Best Practices: Data Storage
Data Storage can seem pretty simple. Just store the data on your hard drive or in the Cloud, right? You should ask yourself some questions about the data before you start making decisions about storage. How important is the data? Do I need to keep this data? Can the data be reproduced, or is it unique? How long do I want or need to keep the data? How fast do I need to access the data? How secure do I need to keep the data? Do other people need to access the data? What institutional or funder requirements need to be adhered to? Decide what to keep: Do you really need to keep all of it? 15 drafts of that research paper? All 7 data analysis attempts you made before you found the one that worked? 11 years of weather data that are available from a national data repository database? Blurry image files, data files you don’t remember what they came from or what they were for? Whittle it down to what you really need to keep. Storage costs money, and takes time. Keep the best copy of everything you intend to keep, and discard the rest. Remember to consider the needs of others: Can your raw data, or your analyzed data, or your image files, or your sensor data be of use to other researchers? Was the data you collected ‘unique data’ – in other words, can the same data be collected again, or was the event a one-of-a-kind occurrence? Do you have enough information about it (metadata) to make it useful to others? Does, or did, your funder require you to share that data? Have you already deposited your data in an institutional, discipline-specific, or general repository? Has the data been published? I sort my data into two categories which I call Active data (what I need right now to do my job, project, task) and Stored data (what I want to keep but don’t need immediately).

20 Best Practices: First Steps
Locate all of your data files that you want to store. Decide what you need to keep. Create a directory identifying your data files by name, format, size, and file location. Keep it current! Label the data containers, and the media in the containers. What happens if they get separated? Locate and/or create supplemental documentation (metadata) for each data file. Include variable names, descriptions, units, standards, instrument calibrations, codes, algorithms used to transform the data, software (including version and OS). What do you need to be able to use this data file if you do not remember? Organize the files. Document your organization methodology. Use it consistently. Locate everything to be stored: Where are the data now? Collect all of your disks, CDs, DVDs, floppies, image files, images on cameras, data on media cards, digital files, sensor data, s, documents, cloud locations, etc. Try to gather them in one place in a folder on your computer, or a series of folders, depending on the volume of material. At the very least, identify where they are by creating a directory. This can be as simple as a list on a piece of paper or a spreadsheet on your computer. Date it. a copy to yourself. Place a copy in your desk. Keep it current. If the data is on multiple devices, such as hard drives or CDs or DVDs, make sure they are properly identified and labeled, and they match the entry on the list. Label not just the container or package, but the media itself, if possible. The disk may be separated from the envelope. The drive may be moved to another box. Someone else may be looking for it.

21 Best Practices: File Naming
Be Consistent Use descriptive names Not too long (32 characters max); CamelCase Try to include time Date using YYYYMMDD (create chronological order) Use version numbers Don’t use special characters [*&^$!~] Don’t use spaces - use “-” or “_” Don’t change default extensions Identify different versions clearly Add zeroes in front for large data sets ( instead of 01 if you expect images) File names are the easiest way to indicate the contents of the file. Use terse names but indicative of their content. Want to uniquely id the data file. Be unique but reflect the file content. Think about the organizing principle, don’t just make up a system as you go along. Don’t’ make them too long, some scripting programs have a filename limit for file importing (reading) Don’t use blanks/spaces in file names, some software may not be able to read file names with blanks. Think about how the name will look in a directory with lots of other files, want to be able to “pick it out”. Name the files: There isn’t a default system for organizing your files, but there are a few best practices to keep in mind. Be consistent—keep the same file naming conventions for all of your files. Use descriptive names on folders, files and documents. If there are different versions of a document, make sure that the versions and/or dates are part of the description. Keep it simple—photos go in the ‘Photos’ folder, sensor data for the 3M Turbosynthesizeranalyzer goes in the ‘3MTurbo’ folder, data from your research project on the morbidity of male mice found under decayed oak firewood piles you conducted in 2009 goes in the folder ‘Mice Morbidity 2009’ folder. If it won’t make sense to you tomorrow, it definitely won’t make sense to you in 5 years.

22 Best Practices: File Organization
Folders named for major functions/activities Structure by date or event (especially subfolders) Names should be self-explanatory Avoid duplication Make it simple & consistent Organize the files: There isn’t a default system for organizing your files, but there are a few best practices to keep in mind. Be consistent—keep the same file naming conventions for all of your files. Use descriptive names on folders, files and documents. If there are different versions of a document, make sure that the versions and/or dates are part of the description. Keep it simple—photos go in the ‘Photos’ folder, sensor data for the 3M Turbosynthesizeranalyzer goes in the ‘3MTurbo’ folder, data from your research project on the morbidity of male mice found under decayed oak firewood piles you conducted in 2009 goes in the folder ‘Mice Morbidity 2009’ folder. If it won’t make sense to you tomorrow, it definitely won’t make sense to you in 5 years. Bulk Rename Utility (Windows, free) Renamer 4 (Mac) PSRenamer (Linux, Mac, or Windows, free) Metadata conversation here. Dublin Core and FRBR as helpful guides, *some suggest camel case, others all lowercase—doesn’t matter if you are consistent. File names are the easiest way to indicate the contents of the file. Use terse names but indicative of their content. Want to uniquely id the data file. Be unique but reflect the file content. Think about the organizing principle, don’t just make up a system as you go along. Don’t’ make them too long, some scripting programs have a filename limit for file importing (reading)

23 Best Practices: Primary Data
Never work with your primary data file! Always make a copy to work with. Your computer hard drive or working environment should only store your current working data file. Your primary (master or raw) data file should be stored in a safe environment, and backed up. Disasters and accidents do happen: Hardware failures Software problems Virus Infections Corrupted data files Power failures Hacking Stolen computers Human error Natural Disasters Media degradation Keep the original file as an read-only file. Give it a file name that can be used as the first part of all subsequent files related to it.

24 Best Practices: 3-2-1-Rule & Threat Zones
Data Backups are very important. Follow the Rule: Keep three copies of any important data files – a primary and two backups. Keep two copies on different digital media – a HDD and Flash drive. Keep one copy offsite, or at least offline. This is sometimes called Here-Near-Far. Working copy Here, primary backup Near, and second backup Far. What is a threat zone? A different geographical location from the one you are working in. If a natural disaster occurred in Virginia, and all of your data files are here, then you will probably lose everything. Put your primary, or primary backup, on a media that is secure, and send it to a friend or colleague in another state for safe keeping. Can the Cloud be considered a different zone? Yes, if the servers aren’t in the same zone you are in. Some cloud storage providers allow you to specify where your data is physically stored. Data redundancy is very important. Regular back-ups protect against disasters and accidental or malicious data loss. Ideally, your backup should be stored in a different zone: you want to be able to recover your data if something bad happens. If you keep your backup on the same device/media as your primary, or master, then in the case of a disaster you will lose all of it. 3-2-1 Rule: Keep 3 copies of your data. 2 different media (for example - hard drive and optical media). 1 copy stored offsite, ideally in a different threat zone. Here-near-far: Keep 3 copies of your data. 1 here (your working copy), 1 near (your primary backup), 1 far (your primary data, kept in a different threat zone) What is a threat zone? A different geographical location from the one you are working in. If a natural disaster occurred in Virginia, and all of your data files are here, then you will probably lose everything. Put your primary, or primary backup, on a media that is secure, and send it to a friend or family member in another state for safe keeping. Can the cloud be considered a different zone? Yes, but only if the servers aren’t in the same zone you are in. Backup your data frequently. Copy that backup frequently. Don’t touch the primary unless you need it. Ideally, create a schedule for backing up your data, and follow it. Have someone be responsible for the backup schedule. Follow-up. Periodically test your backed up data for accuracy and consistency. Your schedule should include this. Your schedule should also include the dates when data files are created, so you know when you should migrate to fresh media. Disaster recovery: If the worst happens, recover your data from the primary backup. You may have to start your analysis again, but it is better than having to collect the data again. If you have been saving and backing up your data on a regular schedule, you should lose no more than your work between backups. This is a great reason to do those backups weekly. If you have two storage locations, and they are both in Virginia, then they are in the same threat zone. If one is in North America, and one is in Europe, then they are in different threat zones, and the likelihood that something bad will happen to all copies of your data is minimized.

25 Best Practices: Backups
UVa has a license for CrashPlan, a cloud-enhanced desktop backup service from Code42. It is available to staff, faculty, instructors, and degree-seeking graduate students. It is only to be used with public and moderately sensitive data. The CrashPlan FAQs provide information including getting started and backup-related sections. CrashPlan has an extensive knowledge base available with guides, troubleshooting and a configuring help. It is available for both Windows and Mac machines. UVa had made CrashPlan available to home users with a 25% off deal with Code42. However, as of 10/23, CrashPlan Home is no longer available. They recommend Carbonite or CrashPlan for Small Business. ITS also provides backup for the servers it manages. There are many providers of Backup services available, both online and software-based. Automatic backups are a great option for ensuring that your data is always protected. Acronis, Paragon, and StorageCraft are some of the better companies in this space. UVa CrashPlan: CrashPlan FAQs: Help: Home use: Home 2:

26 Best Practices: Backup Validation
Manually check 5-10% of data files yearly. Is the file collection complete? Compare it to your directory of data files. Did the files transfer properly? Do a bit-by-bit comparison of random files. Use a MD-5 checksum (hash). The UK Data Archive has a good exercise that will help you understand checksum. Write-once media validation. Did you create a validation hash when you created optical disks? If so, compare. Volume and directory validation. Check the media for directory or volume corruption. Storage media integrity. Run a media scan on your HDD for bad sectors. Visual inspection. Does the media look ok?

27 Best Practices: Network Security & Access Control
Network security: Keep confidential or highly sensitive data off computers or servers connected to the internet Physical security: Access to buildings and rooms Encryption: Provides protection by scrambling data, so only the owner of the key or password can read the data. This protects the confidentiality of the data so that if an unauthorized person gained access to the storage device or service, they would be unable to see the data. It also protects the integrity of the data so that it cannot be tampered with without the owner knowing it. Computer systems & files: Use strong passwords on files and systems; use Virus protection (updated continuously and running!); Do not send personal or confidential data via or FTP. Transmit as encrypted data and require data access agreements or confidentiality agreements from recipients. VPN: A VPN scrambles data as it is transmitted between your mobile device and a server. This allows you to access sensitive data securely stored on a remote server. UVa offers several levels of VPNs. Some are designed to handle secure data. Network security: Keep highly sensitive, confidential or IP-valuable data off internet-connected servers (or behind firewalls). Physical security: Who has access to your office and to the data files? Be careful of allowing repairs by an outside company. Computer systems and files: Keep virus protection up to date, and active. Be sure your computer has a login password. Do not send personal or confidential data via or FTP. Encrypt your data prior to sending. VPN: If you have IRB requirements, be sure to follow them. You may need to address data storage when acquiring consent and confidentially agreements for data users. Many IRB’s require 2 levels of security when working with sensitive or confidential data. This can be a password-secured computer in a locked office, or password-protected data files on a password-protected computer. Encryption is another option to ensure the security of data.

28 Best Practices: Common Sense Tips
Desktops, tablets, laptops, and phones should NOT be used for storage of your raw, original, or only copy of data. Removable media – USB Sticks, Flash Cards, Memory Cards, CD’s, DVD’s, Cassettes, DAT’s, portable external HDD’s - should NOT be used for storage of your raw, original, or only copies of data. All removable media are subject to degradation and failure. It will happen. Removable media are all inherently vulnerable to temperature and humidity fluctuations, poor handling, air, moisture, light conditions, theft, mechanical breakdowns, forgetfulness. Manage your stored data. Visit often, at least once a year. Migrate your data media to new media on a pre-set schedule. Migrate to newer formats when possible. Migrate to newer software if possible. Always verify data consistency. Keep your directory up to date. Keep a current copy with your data storage options. Your desktop or laptop computer should store only your current working copy of your data. Your master copy should be stored in a safe environment, and it should be copied in another location as well. Data redundancy is very important. Removable media: this covers all types of media. USB sticks, flash cards, SD cards, etc. Optical media, such as CDs, DVDs. Magnetic media, such as magnetic tapes and hard drives. They are all inherently vulnerable to temperature and humidity fluctuations, poor handling, air, moisture, and light conditions. Removable media should be migrated to new media every 3-5 years. Hard drives should be migrated every 5 years. The consistency of the recorded data should be checked yearly. Manage the collections: You know what you are keeping, where it is, how it is organized, and you have multiple copies in different places, perhaps even on different media. The next step is to actively manage all of your stuff. Media breaks down over time, formats and software become obsolete, hardware changes. Think back to what your computer operating system was when you bought your first one. Odds are the latest machine has something completely different on it. The Library of Congress recommends that you check your documents and media at least once a year to be sure you can read them, and to create new media every 5 years or so to avoid data loss. If errors have occurred, always migrate to new media, ideally from a sound original if possible. Plan on migrating your data every 5-6 years if it is on hard drives. CDs and DVDs (optical media) have longer life expectancies, but there is no specific date for when they may fail. When CDs were first invented and marketed, the industry claimed a life expectancy of 100 years. Realistically, expect about 30 years under IDEAL CONDITIONS!! Magnetic tape will last years. The Minnesota State Archives has a nice explanation of digital media on their website at The best option is also the easiest for you: have someone else store your data, responsibly. Deposit in a repository that manages their collections. They will handle the duplication of the files, manage the migration to new formats, make sure of the integrity and usefulness of the data. Contact us at Research Data Services to help you find an appropriate repository. Hopefully, we’ll have a workshop next semester to help you select the right one.

29 UVa Data Storage Options - Research
Research Data Storage Visit the CADRE site for information about the Various options available At UVa. They manage the Rivanna and HPC resources. CADRE – Computation and Data Resource Exchange ( – has a very good diagram showing the options for research data storage at UVa. HPC: ARCS: Rivanna: ARCS:

30 UVa Data Storage Options - Enterprise
Enterprise Data Storage Visit the CADRE site for information about the Various options available At UVa. They manage the Rivanna and HPC resources. CADRE – Computation and Data Resource Exchange ( – has a very good diagram showing the options for enterprise data storage at UVa.

31 UVa Data Storage Options - ITS
UVa’s ITS provides three levels of storage. All are fee-based: Standard: Standard Storage with No Data Recovery (Tier IV), Standard Storage with Data Recovery (Tier III), and Standard Storage with Data Recovery (Tier III) and Backups. Premium: NAS High-performance storage. Premium Storage with No Data Recovery (Tier III), Premium Storage with Data Recovery (Tier I), Premium Storage with Data Recovery (Tier I) & Backups, Windows Environment Premium Storage (ES1, ES3) Value: Self managed for sensitive & non-sensitive UVa data. Academic Value Storage, and Research Value NAS Storage (researchers only). They provide RESSCU service to store mission critical electronic data at a remote storage facility in Blacksburg, VA. This is a co-owned (with Virginia Tech) Hierarchical Storage Manager (HSM) RESSCU: (behind NetBadge) Standard: Premium: Value:

32 UVa Data Storage Options - Personal
Personal computer UVa Box 1TB storage UVaCollab While not intended for data storage it provides 4 GB storage. Secure, behind NetBadge. Home Directory Service 4 GB storage External Hard Drives (HDD), Flash, USB drives Amazon Web Services OneDrive SharePoint Online Dropbox Google Drive 3rd party servers are available, but you need to remember to know the Terms of Use, Privacy policy, IP policy, and data security and ownership policies, just as you would with a cloud service provider. You will also have to budget their costs, either through funding or out-of-pocket. If you have a small amount of data and files (less than 4GB total) you can also use UVaCollab and Home Directory for storage, but not for sensitive data. They are managed by ITS, and they are behind Netbadge, so they are secure. UVa Box is available for data storage and files. You have 1TB capacity in it. You can upload files as large as 5GB. It can not be used for sensitive data. It is also behind Netbadge, so it is secure. Remember that for hosted services, such as the ITS servers, departmental servers, UVaCollab and Home Directory, the server equipment is managed by ITS. The responsibility for the management of them is not your concern, other than to be sure that the actual data management of your content is your responsibility. But it is always a good idea to keep a second copy off-site is a different location. For UVa Box, Box is responsible for the management of their servers (under contract w/ UVa), but you are responsible for your content.

33 UVa Data Storage Options – UVa Box
UVa Box is a free cloud-based storage and collaboration service available to all UVa eligible students, faculty, and staff. You can store up to 1 TB of non to moderately-sensitive data in it. It is accessible from anywhere you have internet access. You can integrate applications into Box. UVa Box is the same as Box – UVa has an signed agreement about data (which is why the medical folks don’t have access to it). But a non-UVa Box user can interact with a UVa Box user. The UVa Box FAQs can answer many of your questions. How to use UVa Box includes instructions for working with files, sharing & collaborating, and some of the Box features. The UVa Box User Responsibilities & Data Restrictions includes the information you need to understand what types of data you can put in your UVa Box account. The Box Community is a great place to learn about Box from the Knowledge Base and other Users. UVa Box FAQs How to use UVa Box: User Responsibilities & Data Restrictions: Community:

34 UVa Data Storage Options – UVa Collab
UVa Collab is the LMS – Learning Management System – at UVa. It is the University of Virginia’s central online environment for teaching, learning, collaboration, and research. UVACollab partners with faculty, staff, and students in the work that sustains the Academical Village—engaging in interactive discussions, joining virtual meetings, securely storing and sharing materials, and much more. Each site has a storage capacity of 4 GB. You can have multiple sites. It is secure (behind NetBadge) and you can collaborate with colleagues at non-UVa institutions. It is not designed to be used as a storage option, but 4 GB is sufficient for a lot of folks. Great for smaller projects! You can link to external sites and resources, and it has several integrated tools including 3rd party ones - a WordPress Blog and Confluence Wiki. The Kaltura Media Gallery is a good too if you work with images and videos. UVaCollab has a extensive Knowledge Base available when you need assistance. Portal: KB: ITS:

35 UVa Data Storage Options – Home Directory
UVa Home Directory provides: Convenient online file storage - 4 Gigabytes (GB). Backup copies of your documents - within certain time limits. Easy Web publishing - publish and manage webpages. Your personal webpage URL is UVA Computing ID. Easy access using a software program on your computer or via the Internet from any computer connected to the UVA network. There are FAQs for Windows users and Mac users. Files stored in Home Directory are backed up daily. File Recovery information can help you recover from a loss. Mapping a drive in Windows or Mac OS X is a useful feature that allows you to direct-connect to transfer files to and from your desktop. Home Directory: HSM Web Publishing: Windows: Mac: File recovery: Mapping Mac: Mapping Windows:

36 UVa Data Storage Options – SharePoint Online
SharePoint Online is part of the Office 365 suite of tools from Microsoft. ITS also offers Groups & Teams which include a limited version of SharePoint. The Jumpstart Guide for Microsoft Teams is a good place to start if you are interested. ITS also has information about the differences between the two products. The Microsoft SharePoint Online website provides a lot of information about this tool. SharePoint Online is great for: Content management Customization of HTML/CSS to change look and feel Strong use of permissions and inheritance in multi-site layout A site template other than the default (Publishing site, Communication Site). SharePoint Online is accessible from any internet connection, and the permissions allow great granularity to protect your data. It is integrated with the other Office 365 applications. I manage a SP site collection for the Research Data Services & Sciences group in the Library. If you have questions or would like a “tour” contact me directly. SharePoint Online: Groups & Teams: Jumpstart Guide: MS SP Online: Groups & Teams:

37 UVa Data Storage Options – External Drives
There are many types of external hard drives available. Remember that if you are storing UVa- related data (funded project, UVa data, etc.) on one you need to follow the UVa Data Protection Standards (3.0): . There are many options for acquiring external HDDs, SSDs, USB, Flash and other drives. Consider encryption as a way to protect your data.

38 Best Practices: Cloud Storage at UVa
Cloud storage can be used to store additional copies of data. Be very careful reading and understanding the terms of use, privacy policies, data ownership policy, and fee structure. Cloud storage is not appropriate for highly sensitive data (HIPPA). Moderately-sensitive data (FERPA) may be allowed if the provider has an agreement with the University. Be sure you are adhering to the University Data Protection Standards (UDPS 3.0) at if you are storing funded research data or UVa data. The Information Security office at UVa should be contacted if you have any questions about storing your data. Your desktop or laptop computer should store only your current working copy of your data. Your master copy should be stored in a safe environment, and it should be copied in another location as well. Data redundancy is very important. Removable media: this covers all types of media. USB sticks, flash cards, SD cards, etc. Optical media, such as CDs, DVDs. Magnetic media, such as magnetic tapes and hard drives. They are all inherently vulnerable to temperature and humidity fluctuations, poor handling, air, moisture, and light conditions. Removable media should be migrated to new media every 3-5 years. Hard drives should be migrated every 5 years. The consistency of the recorded data should be checked yearly. Cloud storage is a good backup option. But you need to be aware of the terms of use and privacy policies, data ownership policy, and costs. Cloud storage is not appropriate for sensitive data (HIPPA), but may be ok for moderately sensitive data (FERPA).

39 UVa Data Storage Options – AWS
Amazon Web Services provides a suite of tools to do everything with your data. UVa has a signed agreement with DLT Solutions, a 3rd party provider who is a Premier Consulting and Managed Services Partner for AWS. You will need an AWS account set up through ITS to access their services. You will then be able to access the DLT Portal. DLT has conducted several training sessions here in the last 2 years, and I have copies of the content I can share by request. AWS provides a long-term storage option – Glacier – which is inexpensive, and would be a good option for data you want to park. There is a local AWS User Group at UVA. This group is dedicated to connecting UVA students, alumni, and the residents of Charlottesville. Founded and organized by UVA alumni and students, they explore all aspects of working with AWS. Learn about new services and features, hear from developers who are using services in new and exciting ways, learn how to build an engineering career using AWS, and enjoy the company of others who are eager to share experiences. ITS AWS account: User Group: ia.service-now.com/its?id=kb_article&sys_id=1003cadd6f1d62800a017f512e3ee467 AWS Services: DLT Portal:

40 UVa Data Storage Options – OneDrive
OneDrive is the Office 365 cloud storage service that UVa has an agreement with Microsoft for UVa staff and faculty who use Office 365. Every user has 5 TB of storage available. It is integrated with Microsoft Office Online and many of the desktop applications. It is intended for University-related activities. You can get a personal account if you need one (or three). Users have access to assistance and training on the UVa Office 365 Resource Portal. There is an extensive list of FAQs and a page on File Sharing: Internal vs. External. The User Responsibilities & Data Restrictions page will provide you with the information you need to know about which data files you can store on OneDrive. OneDrive: Resource Portal: File Sharing: User Restrictions:

41 UVa Data Storage Options – Dropbox
Dropbox is not an “approved” cloud storage provider for UVa (we do not have a signed agreement with them). But a lot of folks use it. It is similar to Box in how it works and the functionality. There is a free version – Basic, and paid plans in Dropbox Business. They have an extensive knowledge base Help Center to answer your questions. It includes a Getting Started section. The Dropbox Community is a great resource. Remember to read the Terms of Use and Privacy policy for any Cloud-based service provider, even if they are approved by UVa. Dropbox: KB Help: Community: Getting Started:

42 UVa Data Storage Options – Google Drive
Google Drive is part of the Google Gmail package that UVa has for students and alumni. It is not recommended for staff and faculty use, though if you interact with the students on a regular basis it makes sense to have an account. The Gmail FAQs provide a lot of information, and Google Gmail Help is also available. The Google apps that UVa supports with Gmail are: Contacts, Drive and Docs, Calendar, Sites, Talk, and Video. Drive provides 15 GB of storage on personal accounts. Remember to read the Terms of Use and Privacy policy for any Cloud-based service provider, even if they are approved by UVa. Google Gmail: ITS Gmail FAQs: Gmail Help:

43 UVa Data Storage Options – Azure
UVa does not currently have an agreement with Microsoft for the Azure Computing Platform. Azure is a set of cloud services similar to the AWS system. Azure:

44 Research Data Services + Sciences
I am part of the Research Data Services + Sciences unit of the UVa Library. Our website has information on all of our services. The Research Data Management section includes a link to my Research Data Management Subject Guide. We offer an assortment of workshops each semester. We also have an archive of downloadable content from previous workshops. We are part of the CADRE group. Past: Current: RDS: RDM: CADRE:

45 Thanks for attending! Do you have additional questions? Feel free to contact me at to set up a consultation.


Download ppt "Data Storage Best Practices"

Similar presentations


Ads by Google