Storage of large research data volumes in AFS (on a very low budget)

Storage of large research data volumes in AFS (on a very low budget)
Richard Brittain Dartmouth College What this talk was going to be about - experience with a large expansion and volumes pushing the limits. What this talk is about instead. - plans to do that. Why it took so long to get final buy-in to expand the AFS cell. Extended site report. Large storage on the cheap. June 2009 Dartmouth College Comp. Svcs.

Dartmouth College Comp. Svcs.
Introduction The northstar.dartmouth.edu cell Expansion plans Problems: financial, historical, cultural Some case studies Miscellaneous tools June 2009 Dartmouth College Comp. Svcs.

Research Computing support group A small cell (by some standards) Cell name is a legacy of Project Northstar Client mix has changed greatly over time Research Computing Support people. Code debugging, statistics, general help for Unix/linux, administer the AFS cell, special projects, Beowulf Cluster 100 nodes/800 cpus, central systems (compute servers and infrastructure), web server, license servers,misc. test servers and test AFS cell, test SAMBA. Engineering, CS, Physics, Math all do their own thing, but only Engineering has a sizeable central file store as a departmental resource. We mostly support faculty, grads, postdocs, but some undergrads and a few staff. - small, with aspirations: - old (not good!) - cell name is the only relic of a major project (athena lite) in late 80s, early 90s Originally mostly public workstations administered by computing services, and for curricular use. Now fewer clients, and they are central servers (linux, controlled by us, and faculty-owned desktops, mix of Windows, Macs and Linux. - K4/kaserver, naturally, and problems moving forward June 2009 Dartmouth College Comp. Svcs.

Cell Statistics 3 file servers, 3 DB servers 10 TB, 3TB in use (+ 20TB in the mail) 2434 volumes 701 user homes 50 ‘data’ volumes ( GB) < 100 clients 2 physically separated data centers TB, approx 3TB used, 2T goes to NetBackup, 1TB exempt - will be another 17.1 GB (20GB raw), but no additional file servers. (30 TB total) - volumes 2434 total - 50 'data' volumes 50GB-600GB - 701 user home volumes. - quotas 10GB/1GB (50GB on demand, up to 1TB temporary scratch) current record, 950GB of temporary scratch created in 1 day by a SAS programmer. - volume size distribution - vast majority < 1GB, a few > 100GB. - < 100 clients at any time. - client type distribution. Typically < 100 clients active. Linux, Mac, Windows - Usage: compute servers (in MR), Discovery cluster head end (100/800 CPU) Discovery has thumper/NFS distributed to compute nodes. workstations we administer (all gone away), faculty owned desktops (rising) AFS client on public Mac and Windows around campus - not in common use, because undergrads don't know about it. - 2 data centers on FS on DB - need to move and re-address one - painful. - layout - use of /worldwide at top level. - nonstandard. (early Mac and Irix GUI file browsers) -still that way June 2009 Dartmouth College Comp. Svcs.

Hardware IBM x3650, EXP3000 disk vaults, LSI/ServeRAID controllers. 750 GB SATA disks 1Gbps between servers and central systems 100Mbps to all departments and desktops RHEL 5 everywhere; ext3 filesystems vicepX are 3.4TB (1/2 vault) - FS IBM x MegaRAID or ServRAID 10MR + EXP3000 vaults + 750GB SATA - DB IBM x3250 - 1Gbps between all FS and DB, and central Linux general purpose machines and Discovery cluster - 100Mbps to all department labs and offices - RHEL5 everywhere., ext3 filesystems. June 2009 Dartmouth College Comp. Svcs.

Backups vos dump | gzip > local disk on each server NetBackup picks up compressed images 2TB staging space; 2:1 compression Large data volumes get replicated instead Monthly full / daily incremental Work in progress to spread fulls - vos dump | gzip > local staging; netbackup locally (and in parallel) on all file servers, load 2-3, done in < 1 hour typically. Thought about a dedicated backup server but so far have not done this. On 'full' days all server have high load for 6-10 hours - mostly doing the gzips. - How much space to dedicate to staging is an open question. Current FS have 2TB each of staging disk, and store > 1 month of incrementals, plus 1 compressed full Get more than 2:1 on average. Track compression ratios per volume in our logs. May flag specific volumes to not be compressed if they are large full of compressed data - waste of time. SAS datasets tend to be very fluffy, and large. - monthly full; daily incremental - all servers on same day - global last full dump timestamp. - simple restore, crunch on monthly full day - staging keeps upwards of 1 month - would like to avoid sending these to NetBackup altogether. - resend to NetBackup as many times as needed until it works! Retry each day. - measure the overhead of providing .backup for all volumes, even scratch etc. <1% typically - no file level backup, no special backup ACL Ways to improve - spread out over days - skip completely unneeded - link staging space - not using NetBackup incrementals - everything is a 'full' to NB. No way to 'reuse' a previous full which hasn't changed. Need a database of additional volume metadata to support this, and other reasons - work in progress. June 2009 Dartmouth College Comp. Svcs.

Growing pains Plans for significant expansion, but no committed funding or increased manpower Biologists have several TB, will get TB in the next couple of years Chargeback model needed Legacy issues with AFS cell Biology has several TB, and expects TB in the next couple of years, and has money to spend. Growth plans, and the 'Research File Store' - commitment, but no $$, and no additional manpower. - Chargeback model - how to calculate - how much to reserve for 'free' space or overhead % for system volumes. How much do we need to keep empty for the system to work. Over head of .backup and local .readonly - legacy/institutional memory. - cell name reminds users of old project Northstar - slow workstations, tiny quotas. - "kerberos" on Dartmouth campus means Sidecar/KClient to many old timers - retired technology. - new large/paying users will be mostly windows desktops - important that they get a good experience - some users had a bad experience with Windows clients - install problems and insufficient instruction - Some Windows users wanted a no-install solution (clientless): point CIFS at a server and go. June 2009 Dartmouth College Comp. Svcs.

Flirtation with CIFS Engineering School likes CIFS Test SAMBA server Research SAMBA/AFS integration Authentication requirements Cost :-( Plan B - back to AFS - but some of the buzzwords seem to fit again - SAMBA experience Investigated SAMBA/AFS gateway - concluded it wasn't ready in a form we were willing to use. Plan B was always to expand AFS cell - Set up test SAMBA server to learn capabilities - Storage system current buzzwords - virtualized storage; replication; federated; scalable - bottom line - buy in bits and pieces because - it will be cheaper next year - the researchers may not grow as fast they think (our experience) - leverages the AFS distributed model - Engineering school previous AFS cell - afs admins left - retired AFS cell and installed a NetApp for CIFS - mostly windows clients, not good desktop support in house - previously was AIX/IRIX/Linux mostly - AD login slaved to master campus DND (K5/LDAP) - Kerberized NFSv4 with canonical AD names for Linux clients. Not user-installable. Hard, but did it so linux users would have homes on the netapp - can't do with cifs mount. - no client install, (not quite) - set up shortcuts automatically. - don't allow users to change ACL, because their backup system doesn't preserve ACLs - no access from off-campus, execpt by VPN - windows users like it, admins like it, cost a lot, can't incrementally grow. Expensive to maintain. Perception that we have only 1 chance to make a good first impression, and if the system doesn't work well immediately, users won't take to it. Benefits of AFS - scalability and low cost. Off campus access w/o VPN, institutional sharing. Authentication against central campus LDAP/K5 highly desirable - will try to move this way. Maybe cross-realm. Will try to move towards SSO using the existing master realm, but issues with long usernames (very long), not synchronised, and old accounts not 1:1 in some cases. Other storage technology exploration Federated storage Virtualized storage - vendor solutions cost too much - SAN solutions don't work to end users in departments AFS has all these things (effectively) Incremental growth obviously best value for money - buy larger, cheaper disks next year when demand increases. June 2009 Dartmouth College Comp. Svcs.

The Opposition USB drives $100/TB Buffalo Terastation $150/TB Fun with rsync Resistance to lots of servers NetBackup limits: use AFS replication as backup Explored shadow volumes Adding a 'new file server' raises all sorts of questions wrt cost, space, power, cooling, necessity. Adding more disks to an existing Server seems much less so, even though they account for most of the cost. Maximum disk we can add to single server will be 18 vaults, x 12TB raw, = 216TB not a good idea. Much better to get more servers and increase the overall flexibility. Maybe larger vicepX to minimize orphaned space -- whole vaults ? 7.5 TB Would like to try xfs - but RHEL doesn't support it. Default 'free' user homes, 10GB (maybe increase) - don't know how much this will grow as usage increases for research data storage. Free space, and shared volumes for applications etc. are all subtracted from the chargeable amount. Don't expect to reduce this. Decided NO more non-backed-up, non-replicated volumes (liability). Except for a single global scratch - precreate a directory for each user with private ACL - need a purging algorithm that works - still a WIP Replication 2 (or more potentially) copies as only backup for the large data volumes (larger than about 500GB - still exploring the gray area). explicit -rw mount point. RO is the other data center. Release daily, or possibly on the user's requested schedule - may want less than daily to allow a little historical recovery in case of user oopses. Use volume naming scheme to track which ones we want to be replicated and not backed up to NetBackup. Name change ? Research Computing Support Group - northstar removed from everything now except cell name. Try to use link Research File Store; DartStore, Datastore ? (don't mention cell name if possible). Beginning to change mount points for all user homes and user data - more pain. Increased desktop client support - even though fewer clients in total, we don't control them now. June 2009 Dartmouth College Comp. Svcs.

Case Study Biology: gene sequencers 1 TB per “run” Typical files 2 MB TIFF May be able to compress 4:1 or more Store 3 years minimum Biology - multiple TB of gene sequencer data, and image analysis TIFF, PNG, JPEG. 1 TB/run. Many image files typically 2MB each, (TIFF), but 4:1 with lossless compression, lots more with JPEG. Will work with the data owners, but it makes a big difference to how big their volumes need to be They will surely ask for > 2TB as a single volume if it is possible. June 2009 Dartmouth College Comp. Svcs.

Case Study Medical School long term study 8 years of aspirin data in SAS datasets Several rounds of hardware and software upgrades Many researchers came and went. ACLs are a mess Data are now frozen DHMC long term aspirin study. 11 years (7+4), SAS, AFS, changes of hardware, SAS versions, 2GB disks -> AFS vols, ACLs a horrible mess. Many users and groups. Central systems and workstations with SAS. Analysis still ongoing - no more updates. June 2009 Dartmouth College Comp. Svcs.

Case Study Proteomics research Data acquired on unattended PC off campus Written to AFS with IP ACL Visible to Beowulf Cluster head end High volume; no backups Medical center researcher: proteomics, data acq. at a remote site (DHMC, different network, more restricted) dump data to AFS to get it to Cluster. Windows system to collect data - also IP acld - drops new data directly into AFS where jobs on cluster pick it up. Uses large volume, but transient - not backed up (both ends back it up, in original form, then in post-processed form) June 2009 Dartmouth College Comp. Svcs.

Case Study Auroral Radio Noise research in the Arctic Multiple field sites, but Greenland are the only ones on the internet 24x7 High latency; behind NAT; AFS not happy scp daily summaries directly to Dartmouth, into AFS space, visible to web server Researcher happy Greenland Auroral Radio Noise data. Network at SS very high latency and behind a NAT AFS worked, but not well - often timed out. Changed it to scp data to Dartmouth to the faculty member's machine drop directly into an IP-ACLd directory, which is also visible to the web server. Linux imager and DOS system, collecting 22 MB/day for 12 years and won't die. Imager is still under construction. Preprocessing makes PDFs ready for web. Other data collected by these groups are a problem for central storage -- 1GB/minute - need to test cache bypass clients and large volumes. Done to local disk and reduced first. June 2009 Dartmouth College Comp. Svcs.

House call… DOS, collecting 21 MB/day for 12 years. Linux on the only slightly newer Dell is imager experiment House Call… June 2009 Dartmouth College Comp. Svcs.

Case Study Biology: scanner images 650GB stored on Terastation Pulled with rsync for several months Now use AFS as primary storage Replicated volume Biology, Terastation backup and rsync fun 650 GB, pulled over, then incrementally w rsync. Security issues. June 2009 Dartmouth College Comp. Svcs.

Miscellaneous user tools
afsquota Volume Name Quota Used % Used Part. Available user.richard GB GB 92% 278% MB freespace mizar /vicepa: 1568 GB free out of 3416(54.1% used) centaurus /vicepa: 2581 GB free out of 3416(24.5% used) oort /vicepa: 2806 GB free out of 3416(17.9% used) listvols users.b.readonly RO kB On-line rc.mizar.a RW kB On-line … datad.jhamilton RW GB On-line rep.wibble RW GB On-line rep.mcpeek RW GB On-line Total volumes for server mizar:[a] onLine 272; offLine 0; busy 0 (Nothing like Russ's toolset) afsquota (fs listquota) Volume Name Quota Used % Used Part. Available user.richard GB GB 92% 278% MB (need to get rid of the partition values -- wrong for > 2TB partitions anyway. Freespace (vos partinfo) freespace mizar /vicepa: GB free out of (54.1% used) centaurus /vicepa: GB free out of (24.5% used) oort /vicepa: GB free out of (17.9% used) freespace -v mizar /vicepa: GB free out of (54.1% used, onLine 272, offLine 0, busy 0) centaurus /vicepa: GB free out of (24.5% used, onLine 4551, offLine 0, busy 0) oort /vicepa: GB free out of (17.9% used, onLine 92, offLine 0, busy 0) listvols users.b.readonly RO kB On-line rc.mizar.a RW kB On-line users.r.readonly RO kB On-line ..... rep.wibble RW GB On-line rep.mcpeek RW GB On-line Total volumes for server mizar:[a] onLine 272; offLine 0; busy 0 June 2009 Dartmouth College Comp. Svcs.

Miscellaneous tools cont.
setacl setacl -Rv system:authuser,read publicstuff moveafsvol moveafsvol dest-server dest-partition [volume-name ...] klog_wrapper polaris [12:58pm] ~ $klog rbadmin Running interactive shell with command logging Enter AFS (rbadmin) Password: bash-3.2$ bash-3.2$ exit autoconfigure: upserver, upclient, and make (really need to learn how to use puppet) Setacl (fs sa - recursive wrapper - use all the time) -R - recurse - otherwise, wildcard args in current directory -v - show what you are doing setacl -Rv system:authuser,read publicstuff moveafsvol usage: moveafsvol destination-server destination-partition [volume-name ...] klog_wrapper (why we have this -- su-wrapper too) $klog rbadmin Running interactive shell with command logging Enter AFS (rbadmin) Password: bash-3.2$ bash-3.2$ exit Configuration tool for FS/DB - upserver and upclient, and 'Make' (really need to learn how to use puppet or something like that). Other tools: Mountpoint policy enforcement - base on volume naming convention Replication policy enforcement - base on volume naming convention Quota checking - base on external volume metadata Expiration checking - base on external volume metadata. autorelease on sync site (only 1), with localauth, to release the rep.* volumes checks replication policy, uses regex against volumes names. Monitoring tools - keep 1 small volume on every vicep (enforced by script) - try to read the file (flush from cache) from some client. Problem - server is up, afs processes are up, big brother is green, network monitor is green, - vault lost power due to UPS failure, but server was on a different UPS. All file access attempts gave read errors. Volume split and volume merge - ugly mess of 'tar' and recursive copyacl. A native 'split' would be great. A better merge would be great. 'up' is good but doesn't preserve hard links within directories, which several commercial app installers like to make. June 2009 Dartmouth College Comp. Svcs.

Storage of large research data volumes in AFS (on a very low budget)

Similar presentations

Presentation on theme: "Storage of large research data volumes in AFS (on a very low budget)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Storage of large research data volumes in AFS (on a very low budget)

Similar presentations

Presentation on theme: "Storage of large research data volumes in AFS (on a very low budget)"— Presentation transcript:

Similar presentations

About project

Feedback