Presentation is loading. Please wait.

Presentation is loading. Please wait.

Status: Central Storage Services CD/LSC/CSI/CSG June 26, 2007.

Similar presentations


Presentation on theme: "Status: Central Storage Services CD/LSC/CSI/CSG June 26, 2007."— Presentation transcript:

1 Status: Central Storage Services CD/LSC/CSI/CSG June 26, 2007

2 Storage Services 1. File Based Storage NFS/CIFS (BlueArc) – Fast on-site access AFS – Global access, authenticated filesystem 2. Block Based Storage Fibre-Channel connect to SAN 3. Archival Storage Backups 1. File Based Storage NFS/CIFS (BlueArc) – Fast on-site access AFS – Global access, authenticated filesystem 2. Block Based Storage Fibre-Channel connect to SAN 3. Archival Storage Backups

3 NAS Status Newest Service 2 Production clusters 1. Fermi-Blue (1 st generation cluster) 2. RHEA (2 nd generation cluster) Newest Service 2 Production clusters 1. Fermi-Blue (1 st generation cluster) 2. RHEA (2 nd generation cluster)

4 NAS Status 3/06 – NAS heads ordered (Fermi-Blue) 5/06 – Pilot deployment SLF, DSG, KITS, PPD and FESS department servers Year 1 projection: 10TB deployed storage 3/06 – NAS heads ordered (Fermi-Blue) 5/06 – Pilot deployment SLF, DSG, KITS, PPD and FESS department servers Year 1 projection: 10TB deployed storage

5 NAS Status Phase 1 Phase 2 Phase 3 Year 2 Year 3 Year 1 Department Servers, Array Consolidation Rollout to Farms servers Rollout to Farms Workers Projected Rollout

6 NAS Status Phase 1 Phase 2 Phase 3 Year 2 Year 3 Year 1 Department Servers, Array Consolidation Rollout to Farms servers Rollout to Farms Workers Actual Rollout

7 NAS Status Actual Year 1 Deployment Q2 2006 : Pilot Program Early adopters Timing CMS “home” area evaluation Fermigrid NFS issues Actual Year 1 Deployment Q2 2006 : Pilot Program Early adopters Timing CMS “home” area evaluation Fermigrid NFS issues

8 NAS Status Actual Year 1 Deployment (cont) Q3 2006: Production Phase 1 in full production CMS + Fermigrid go production (Phase 2) Additional NAS heads purchase (RHEA) Year 1 projection revised to 200TB deployed storage Actual Year 1 Deployment (cont) Q3 2006: Production Phase 1 in full production CMS + Fermigrid go production (Phase 2) Additional NAS heads purchase (RHEA) Year 1 projection revised to 200TB deployed storage

9 NAS Status Actual Year 1 deployment (cont) Q4/2006 CMS and Fermigrid deploy to worker nodes (Phase-3) Q1/Q2 2007 D0/CDF/Miniboone begin consolidation of servers into central NAS service Requests for space from LHC, ILC and SDSS Actual Year 1 deployment (cont) Q4/2006 CMS and Fermigrid deploy to worker nodes (Phase-3) Q1/Q2 2007 D0/CDF/Miniboone begin consolidation of servers into central NAS service Requests for space from LHC, ILC and SDSS

10 NAS Status NAS Storage Growth Year 1

11 NAS Status Q3 2007 storage deployment @ 425-905TB

12 NAS Status Current customers Experiments CMS, CDF, D0, FermiGrid/OSG, Miniboone ILC, LHC, SDSS, Sciboone(?) Departments CD, Directorate, FESS, ES&H, PPD, VMS Services Scientific Linux (FERMI), CVS, KITS, Alphaflow, Enstore Current customers Experiments CMS, CDF, D0, FermiGrid/OSG, Miniboone ILC, LHC, SDSS, Sciboone(?) Departments CD, Directorate, FESS, ES&H, PPD, VMS Services Scientific Linux (FERMI), CVS, KITS, Alphaflow, Enstore

13 NAS Status Benefits Stability -- Savings multiplier Effort re-directed towards supporting application Reduced downtime Increased productivity Consolidation (30+ servers/storage arrays) Reduce equipment support costs Reduce power + cooling Benefits Stability -- Savings multiplier Effort re-directed towards supporting application Reduced downtime Increased productivity Consolidation (30+ servers/storage arrays) Reduce equipment support costs Reduce power + cooling

14 NAS Status Benefits (cont) Ease of use Familiar storage solution – minimal training Flexible Choice of storage tiers, price points Benefits (cont) Ease of use Familiar storage solution – minimal training Flexible Choice of storage tiers, price points

15 NAS Status Challenges Growth higher than expected – Lun limit Each cluster is limited to 256 luns Each lun limited to 2TB Upgrade to 64TB lun support expected EOY 2008 Criticality of service Central location Offsite DR required? Challenges Growth higher than expected – Lun limit Each cluster is limited to 256 luns Each lun limited to 2TB Upgrade to 64TB lun support expected EOY 2008 Criticality of service Central location Offsite DR required?

16 NAS Status Challenges (cont) Backup of large data an issue Large data areas >5TB Millions of files Logistics Power Floor space Challenges (cont) Backup of large data an issue Large data areas >5TB Millions of files Logistics Power Floor space

17 NAS Status FY08 Plans Expansion of service Participate in Tier 3 evaluation Development of better reporting tools FY08 Plans Expansion of service Participate in Tier 3 evaluation Development of better reporting tools

18 NAS Status More info: http://computing.fnal.gov/nasan/bluearc.html Questions? More info: http://computing.fnal.gov/nasan/bluearc.html Questions?

19 SAN Status 272 Fibre-Channel ports 128 ports added to fabric in ‘07 (CMS contribution) Qlogic switches 2Gb Fibre Channel Connections 272 Fibre-Channel ports 128 ports added to fabric in ‘07 (CMS contribution) Qlogic switches 2Gb Fibre Channel Connections

20 SAN Status 23 storage arrays 12 centrally managed Database Array (3PAR) purchased and tested D0ora2 deployment 7/2/2007 Start retiring 1 st Generation Tier 2 storage arrays (Infortrend) 11 externally managed 23 storage arrays 12 centrally managed Database Array (3PAR) purchased and tested D0ora2 deployment 7/2/2007 Start retiring 1 st Generation Tier 2 storage arrays (Infortrend) 11 externally managed

21 SAN Status 346TB 156TB centrally managed 190TB externally managed 346TB 156TB centrally managed 190TB externally managed

22 SAN Status SAN fabric opened up to external members CMS, CDF, D0, Miniboone Must retire LSI storage array End of support (year end 2007) Impacts IMAP/POP, AFS, DSG(CDF) SAN fabric opened up to external members CMS, CDF, D0, Miniboone Must retire LSI storage array End of support (year end 2007) Impacts IMAP/POP, AFS, DSG(CDF)

23 SAN Status FY07 Plans Additional HDS array NAS storage for SDSS, Windows Migration, DSG Block storage for LSI migration FY07 Plans Additional HDS array NAS storage for SDSS, Windows Migration, DSG Block storage for LSI migration

24 SAN Status FY07 Plans (Cont) Purchase 2 Nexsan SATAbeasts Replace 4 Infortrend arrays Backup cache disk, DSG RMAN disks Test as possible tier 3 candidates FY07 Plans (Cont) Purchase 2 Nexsan SATAbeasts Replace 4 Infortrend arrays Backup cache disk, DSG RMAN disks Test as possible tier 3 candidates

25 SAN Status FY08 Plans Additional capacity for 3PAR For sparing DSG migration Additional capacity for NAS Decommission remaining Infortrend arrays Other tier 3 alternatives (nexgen HDS, DDN) Virtualization across arrays FY08 Plans Additional capacity for 3PAR For sparing DSG migration Additional capacity for NAS Decommission remaining Infortrend arrays Other tier 3 alternatives (nexgen HDS, DDN) Virtualization across arrays

26 SAN Status Questions?

27 Site Backup Status Service entering 4 th year 10/07 2 Backup Servers Chasm (infrastructure and business) Canyon (experiment) 1 Library (600 slots) 8 SAIT-1 Tape Drives 2 Infortrend Storage arrays TiBS Backup Software Service entering 4 th year 10/07 2 Backup Servers Chasm (infrastructure and business) Canyon (experiment) 1 Library (600 slots) 8 SAIT-1 Tape Drives 2 Infortrend Storage arrays TiBS Backup Software

28 Site Backup Status 22TB+ data 12,700+ backup volumes 5,506 UNIX/Windows, 7171 AFS, 25 NDMP 452+ clients 18.5% increase in past 6 months (3.7TB) No single volume > 100GB 22TB+ data 12,700+ backup volumes 5,506 UNIX/Windows, 7171 AFS, 25 NDMP 452+ clients 18.5% increase in past 6 months (3.7TB) No single volume > 100GB

29 Site-Backup Status Typical Daily Backup Timeline (canyon) Incr/NetworkMergesRetry Debug 6:00PM2:00AM1:00PM1:40PM 24 hour window

30 Site-Backup Status Issues Resolving client backup issues High client volatility Reconfiguration/Renaming/Reinstalls Large delta in data Contacting admins Slow client network performance Issues Resolving client backup issues High client volatility Reconfiguration/Renaming/Reinstalls Large delta in data Contacting admins Slow client network performance

31 Site-Backup Status Issues (cont) Merge problems Can be difficult to debug Tape drive/Software or combination Cache disk Multiple disk failures Issues (cont) Merge problems Can be difficult to debug Tape drive/Software or combination Cache disk Multiple disk failures

32 Site-Backup Status Issues (cont) SAIT-1 drive performance issues Tapes written on one drive are slow to read on another Long debug time > 1 hour Usually requires multiple replacements Sony and Spectra investigating Too few Issues (cont) SAIT-1 drive performance issues Tapes written on one drive are slow to read on another Long debug time > 1 hour Usually requires multiple replacements Sony and Spectra investigating Too few

33 Site-Backup Status FY07 Plans Chasm Canyon IP Disk Cache SAIT-1 Drives SAN Migrate more backups to NDMP Relieve pressure on chasm Migrate clients from canyon to chasm Relieve pressure on canyon LTO-4 NDMP

34 Site-Backup Status FY07 Plans (cont) Upgrade cache disks Replace aging Infortrend disks Higher performing array RAID 6 FY07 Plans (cont) Upgrade cache disks Replace aging Infortrend disks Higher performing array RAID 6

35 Site-Backup Status Challenges Desire from users to expand backups Larger backup volumes Larger backup sets Challenges Desire from users to expand backups Larger backup volumes Larger backup sets

36 Site-Backup Status FY08 Plans Upgrade Servers to Solaris 10 Faster IP stack and Filesystem Upgrade server hardware Faster bus speed Utilize faster cache disk Take advantage of faster filesystem Feed faster tape drives Migrate canyon backups to LTO-4 FY08 Plans Upgrade Servers to Solaris 10 Faster IP stack and Filesystem Upgrade server hardware Faster bus speed Utilize faster cache disk Take advantage of faster filesystem Feed faster tape drives Migrate canyon backups to LTO-4

37 Site-Backup Status FY08 Plans (cont) Investigate Disk-based library TiBS specific implementation Use common disks as a disk library Synchronous copy to tape (also) Faster restores, possibly backups May increase overall backup system throughput FY08 Plans (cont) Investigate Disk-based library TiBS specific implementation Use common disks as a disk library Synchronous copy to tape (also) Faster restores, possibly backups May increase overall backup system throughput

38 Site-Backup Status FY08 Plans (cont) Investigate Virtual Tape Library Agnostic solution (not TiBS specific) Asynchronous copy to tape Emulate tape drives and libraries Faster restores and backups Will increase overall backup system throughput Some systems have data-deduplication Inline or post-process FY08 Plans (cont) Investigate Virtual Tape Library Agnostic solution (not TiBS specific) Asynchronous copy to tape Emulate tape drives and libraries Faster restores and backups Will increase overall backup system throughput Some systems have data-deduplication Inline or post-process

39 Site-Backup Status More information: http://computing.fnal.gov/site-backups Questions? More information: http://computing.fnal.gov/site-backups Questions?

40 AFS Status 12 AFS servers ~17TB storage Largest customers: Minos and Web Roughly 8-10% increase per year (Based off number of volumes) Must migrate servers off of LSI storage array and onto HDS Tier 2 storage. 12 AFS servers ~17TB storage Largest customers: Minos and Web Roughly 8-10% increase per year (Based off number of volumes) Must migrate servers off of LSI storage array and onto HDS Tier 2 storage.

41 AFS Status FY07 Plans Migrate data to HDS Tier 2 disks Migration partially complete (1.8TB installed) Tier 2 storage re-allocated to NAS due to high demand Test Solaris 10 AFS server with ZFS FY07 Plans Migrate data to HDS Tier 2 disks Migration partially complete (1.8TB installed) Tier 2 storage re-allocated to NAS due to high demand Test Solaris 10 AFS server with ZFS

42 AFS Status FY08 Plans Upgrade Servers to Solaris 10 Faster OS – filesystem and IP stack Newer CPUs – low power Dual Power Supply Upgrade OpenAFS Multi-domain support Support for > 2GB files Promote RO copies to RW copies FY08 Plans Upgrade Servers to Solaris 10 Faster OS – filesystem and IP stack Newer CPUs – low power Dual Power Supply Upgrade OpenAFS Multi-domain support Support for > 2GB files Promote RO copies to RW copies

43 AFS Status More information: http://computing.fnal.gov/nasan/afs.html Questions? More information: http://computing.fnal.gov/nasan/afs.html Questions?


Download ppt "Status: Central Storage Services CD/LSC/CSI/CSG June 26, 2007."

Similar presentations


Ads by Google