Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mass RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory.

Similar presentations


Presentation on theme: "Mass RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory."— Presentation transcript:

1 Mass Storage @ RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory

2 SLAC -- October 1999Mass Storage @ RCF2 Overview t Data Types: –Raw: very large volume (xPB), average bandwidth (50MB/s). –DST: average volume (x00TB), large bandwidth (x00MB/s). –mDST: low volume (x0TB), large bandwidth (x00MB/s).

3 SLAC -- October 1999Mass Storage @ RCF3 Data Flow (generic) RHIC File Servers (DST/mDST) Reconstruction Farm (Linux) Analysis Farm (Linux) Archive (HPSS) raw DST mDST DST 35MB/s 50MB/s 10MB/s 200MB/s 400MB/s

4 SLAC -- October 1999Mass Storage @ RCF4

5 SLAC -- October 1999Mass Storage @ RCF5 Present resources t Tape Storage: –(1) STK Powderhorn silo (6000 cart.) –(11) SD-3 (Redwood) drives. –(10) 9840 (Eagle) drives. t Disk Storage: –~8TB of RAID disk. 1TB for HPSS cache. 7TB Unix workspace. t Servers: –(5) RS/6000 H50/70 for HPSS. –(6) E450&E4000 for file serving and data mining.

6 SLAC -- October 1999Mass Storage @ RCF6 The HPSS Archive t Constraints - large capacity & high bandwidth: –Two types of tape technology: SD-3 (best $/GB) & 9840 (best $/MB/s). –Two tape layers hierarchies. Easy management of the migration. t Reliable and fast disk storage: –FC attached RAID disk. t Platform compatible with HPSS: –IBM, SUN, SGI.

7 SLAC -- October 1999Mass Storage @ RCF7 HPSS Structure t (1) Core Server: –RS/6000 Model H50 –4x CPU –2GB RAM –Fast Ethernet (control) –Hardware RAID (metadata storage)

8 SLAC -- October 1999Mass Storage @ RCF8 HPSS Structure t (3) Movers: –RS/6000 Model H70 –4x CPU –1GB RAM –Fast Ethernet (control) –Gigabit Ethernet (data) (1500&9000MTU) –2x FC attached RAID - 300GB - disk cache –(3-4) SD-3 “Redwood” tape transports –(3-4) 9840 “Eagle” tape transports

9 SLAC -- October 1999Mass Storage @ RCF9 HPSS Structure t Guarantee availability of resources for a specific user group separate resources separate PVRs & movers. t One mover per user group total exposure to single-machine failure. t Guarantee availability of resources for Data Acquisition stream separate hierarchies. t Result: 2PVR&2COS&1Mvr per group.

10 SLAC -- October 1999Mass Storage @ RCF10 HPSS topology M3M2M1Core Net 2 - Control (100baseT) Net 1 - Data (1000baseSX) STK 10baseT N x PVR pftpd Client (Routing)

11 SLAC -- October 1999Mass Storage @ RCF11 HPSS Performance t 80 MB/sec for the disk subsystem. t 1 CPU per 40MB/sec for TCPIP (Gbit) traffic (1500MTU). t ~8MB/sec per SD-3 transport. t ? per 9840 transport.

12 SLAC -- October 1999Mass Storage @ RCF12 I/O intensive systems t Mining and Analysis systems. t High I/O & moderate CPU usage. t To avoid large network traffic merge file servers with HPSS movers: –Major problem with HPSS support on non-AIX platforms. –Several (Sun) SMP machines or Large (SGI) Modular System.

13 SLAC -- October 1999Mass Storage @ RCF13 I/O intensive systems t (6) NFS file servers for workareas –(5) x E450 + (1) x E4000 –4(6) x CPU; 2GB RAM; Fast/Gbit Ethernet. –2 x FC attached hardware RAID - 1.5TB t (1) NFS Home directory server (E450). t (3+3) AFS Servers (code dev. & home dirs.) –RS/6000 model E30 and 43P t (NFS to AFS migration)

14 SLAC -- October 1999Mass Storage @ RCF14 Problems t Short lifecycle of the SD-3 heads. –~ 500 hours < 2 months @ average usage. (6 of 10 drives in 10 months) t Low throughput interface (F/W) for SD-3 -> high slot consumption. t 9840 ??? t HPSS: tape cartridge closure @ transport error. –Built a monitoring tool to try to predict transport failure (based of soft error frequency). t SFS response when heavy loaded - no graceful failure (timeouts & lost connections).

15 SLAC -- October 1999Mass Storage @ RCF15 Issues t Partially tested two tape layer hierarchies: –Cartridge based migration. –Manually scheduled reclaim. t Integration of file server and mover functions on the same node: –Solaris mover port. –Not an objective anymore.

16 SLAC -- October 1999Mass Storage @ RCF16 Issues t Guarantee avail. of resources for specific user groups: –Separate PVRs & movers. –Total exposure to single-mach. failure ! t Reliability: –Distribute resources across movers share movers (acceptable?). –Inter-mover traffic: 1 CPU per 40MB/sec TCPIP per adapter: Expensive!!!

17 SLAC -- October 1999Mass Storage @ RCF17 Inter-mover traffic - solutions t Affinity. –Limited applicability. t Diskless hierarchies. –Not for SD-3. Not tested on 9840. t High performance networking: SP switch. –IBM only. t Lighter protocol: HIPPI. –Expensive hardware. t Multiply attached storage (SAN). –Requires HPSS modifications.

18 SLAC -- October 1999Mass Storage @ RCF18 Multiply Attached Storage (SAN) Mover 1Mover 2 Client (!) 1 2

19 SLAC -- October 1999Mass Storage @ RCF19 Summary t Problems with divergent requirements: –Cost effective archive capacity and bandwidth. Two tape hierarchies: SD-3 & 9840. Test the configuration. –Availability and reliability of HPSS resources. Separated COS and shared movers. Inter-mover traffic ?!? t Merger of file servers and HPSS movers?


Download ppt "Mass RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory."

Similar presentations


Ads by Google