Presentation is loading. Please wait.

Presentation is loading. Please wait.

GPFS & StoRM Jon Wakelin University of Bristol. Pre-Amble GPFS Basics –What it is & what it does GPFS Concepts –More in-depth technical concepts –GPFS.

Similar presentations

Presentation on theme: "GPFS & StoRM Jon Wakelin University of Bristol. Pre-Amble GPFS Basics –What it is & what it does GPFS Concepts –More in-depth technical concepts –GPFS."— Presentation transcript:

1 GPFS & StoRM Jon Wakelin University of Bristol

2 Pre-Amble GPFS Basics –What it is & what it does GPFS Concepts –More in-depth technical concepts –GPFS Topologies HPC Facilities at Bristol –How we are using GPFS –Creating a mock-up/staging-service for GridPP StoRM Recap & References

3 GPFS Basics IBMs General Parallel File System –Scaleable high-performance parallel file system –Numerous HA features –Life-cycle Management Tools –Provides POSIX and extended interfaces to data Available for AIX and Linux –Only supported on AIX, RHEL and SuSE –Installed successfully on SL3.x (ask me if you are interested) –GPFS can run on a mix of these OSs Pricing - per processor –Free version available through IBMs Scholars program –Currently developing new Licensing model

4 GPFS Basics Provides High-performance I/O –Divides files into blocks and stripes the blocks across disks (on multiple storage devices) Reads/Writes the blocks in parallel Tuneable block sizes (depends on your data) –Block-level locking mechanism Multiple applications can access the same file concurrently multiple editors can work on different parts of a single file simultaneously. This eliminates the additional storage, merging and management overhead typically required to maintain multiple copies –Client-side data-caching Where is data cached? Multi-Cluster Configuration –Join GPFS clusters together –Encrypted data and authentication or just authentication openssl and keys –Different security contexts (root squash á la NFS)

5 GPFS Basics Information Life-cycle Management –Tiered storage Create groups of disks within a file system, based on reliability, performance, location, etc –Policy driven automation Automatically move, delete or replicate files - based on filename, username, or fileset. e.g. Keep newest files on fastest hardware, migrate them to older hardware over time e.g. Direct files to appropriate resource upon creation. Other notable points –Can specify user, group and fileset quotas –POSIX and NFS v4 ACL support –Can specify different IPs for GPFS and non-GPFS traffic –Maximum limit of 268 million disks (2048 is default max)

6 GPFS Topologies SAN-Attached –All nodes are physically attached to all NSDs –High performance but expensive!

7 Network Shared Disk (NSD) Server –Subset of nodes are physically attached to NSDs –Other nodes forward their IO requests to the NSD servers which perform the IO and pass back the data GPFS Topologies

8 application Linux NSD GPFS application Linux NSD GPFS application Linux NSD Server GPFS application Linux NSD Server GPFS Local Area Network In practice, often have a mixed NSD + SAN environment –Nodes use SAN if they can and NSD servers if they cant –If SAN connectivity fails a SAN-attached node can fallback to using remaining NSD servers

9 GPFS Redundancy & HA Non-GPFS –Redundant power supplies –Redundant hot swap fans –… –RAID with hot swappable disks (multiple IBM DS4700s) –FC with redundant paths (GPFS know how to use this) HA Features in GPFS –Primary and secondary Configuration Servers –Primary and secondary NSD Servers for each Disk –Replicate Metadata –Replicate data –Failure Groups Specify which machines have a single point of failure GPFS will use this info to make sure that replicated data is not striped across failure groups

10 GPFS Quorum Quorum –A Majority of the nodes must be present before access to shared disks is allowed –Prevent subgroups making conflicting decisions –In event of failure disks in minority suspend and those in the majority continue Quorum Nodes –These nodes are counted to determine if the system is quorate –If the system is no longer quorate GPFS unmounts the filesystem … … waits until quorum is established … … and then recovers the FS. Quorum Nodes with Tie-Breaker Disks

11 GPFS Performance Preliminary results using time dd if=/dev/zero of=testfile bs=1k count=2000000 Multiple write processes on same node 1 process90MB/s 2 processes 51 MB/s 4 processes18MB/s Multiple write processes from different nodes 1 process 90MB/s 2 processes58 MB/s 4 processes28 MB/s 5 processes23 MB/s

12 GPFS Performance In a hybrid environment (SAN-attached and NSD Server nodes) –Read/Writes from SAN-attached nodes place little load on the NSD servers –Read/Writes from other nodes place a high load on the NSD servers SAN-attached [root@bf39 gpfs]# time dd if=/dev/zero of=file_zero count=2048 bs=1024k real 0m31.773s [root@bf40 GPFS]# top -p 26651 26651 root 0 -20 1155m 73m 7064 S 0 1.5 0:10.78 mmfsd Via NSD Server [root@bfa-se /]# time dd if=/dev/zero of=/gpfs/file_zero count=2048 bs=1024k real 0m31.381s [root@bf40 GPFS]# top -p 26651 26651 root 0 -20 1155m 73m 7064 S 34 1.5 0:10.78 mmfsd

13 Bristol HPC Facilities Bristol, IBM, ClearSpeed and ClusterVision –BabyBlue - installed Apr 2007 –Currently undergoing acceptance trials –BlueCrystal ~Dec 2007 Testing –A number of pump-priming projects have been identified –Majority of users will develop, or port code, directly on the HPC system Only make changes at the Application level –GridPP System level changes Pool accounts, World-addressable Slaves, NAT, Run services and daemons Instead we will build testing/staging system for GridPP –In-house and loan equipment from IBM –Reasonable Analogue of HPC facilities – No InfiniBand (but you wouldnt use it anyway)

14 Bristol HPC Facilities BabyBlue –Torque/Maui, SL 4 Worker Node, RHEL4 (maybe AIX) on Head-Nodes –IBM 3455, 96 dual-core, dual-socket 2.6GHz, AMD Opterons 4? ClearSpeed Accelerator board –8GB RAM per node (2GB per core) –IBM DS4700 + EXP810, 15TB Transient storage SAN/FC network running GPFS BlueCrystal – c. Dec 2007 –Torque/Moab –512 dual-core, dual-socket nodes (or quad-core depending on timing) –8GB RAM per node (1GB or 2GB per core) –50TB Storage, SAN/FC Network running GPFS Server Room –48 water cooled APC racks – 18 will be occupied by HPC, Physics Servers may be co- located –3 x270kW chillers (space for 3 more)

15 GPFS BabyBlue

16 GPFS MiniBlue p-NSD quorum p-Config IBM DS4500 – Configure hot spares s-NSD quorum s-Config --- quorum ---

17 StoRM StoRM is a storage resource manager for disk based storage systems. –Implements the SRM interface version 2.2 –StoRM is designed to support guaranteed space reservation and direct access (using native POSIX I/O call) –StoRM takes advantage of high performance parallel file systems GPFS, XFS and Lustre??? Also standard POSIX file systems are supported –Direct access to files from Worker Nodes Compare with Castor, D-Cache and DPM

18 StoRM architecture Front end (FE): –Exposes the web service interface –Manages user authentication –Sends the request to the BE Data Base (DB): –Stores SRM request and status –Stores file and space information Back end (BE): –Binds with the underlying file systems –Enforces authorization policy on files –Manages SRM file and space metadata

19 StoRM miscellaneous Scalability and high availability. –FE, DB, and BE can be deployed on different machines –StoRM is designed to be configured with n FE and m BE, using a common DB Installation (Relatively straight forward) –RPM & Yaim (FE, BE and DB all on one server) –Additional manual configuration steps e.g. namespace.xml, Information Providers –Not completely documented yet –Mailing list CNAF x2 and Bristol –Basic tests - –Use Case tests - –Currently still differences between Bristol and CNAF installations

20 StoRM usage model

21 Summary GPFS –Scalable high-performance file system –Highly Available, built on redundant components –Tiered storage or multi-cluster configuration for GridPP work HPC –University wide facility – not just for PP –GridPP requirements rather different from general/traditional HPC users –Build an analogue of the HPC system for GridPP StoRM –Better performance because StoRM builds on –Also, more appropriate data transfer model – POSIX and file protocol

22 References GPFS – – s/gpfsclustersfaq.pdf s/gpfsclustersfaq.pdf – Storm – –

Download ppt "GPFS & StoRM Jon Wakelin University of Bristol. Pre-Amble GPFS Basics –What it is & what it does GPFS Concepts –More in-depth technical concepts –GPFS."

Similar presentations

Ads by Google