Presentation is loading. Please wait.

Presentation is loading. Please wait.

IBM Spectrum Scale (formerly GPFS)

Similar presentations


Presentation on theme: "IBM Spectrum Scale (formerly GPFS)"— Presentation transcript:

1 IBM Spectrum Scale (formerly GPFS)
A cluster file system with high-performance, high availability and parallel file access Elastic Storage V1 will be supported on Linux on System z. It will enable enterprise clients to use a high available cluster file system with Linux on System z in LPAR or Linux on z/VM. IBM and ISV solutions (e.g. WebSphere MQ or Tivoli Storage Manager) will provide higher value for Linux on System z clients by exploiting Elastic Storage functionality as described below. Client Value: A highly available cluster architecture Improved data availability through data access even when the cluster experiences storage or node malfunctions Capabilities for high-performance parallel workloads Concurrent high-speed, reliable file access from multiple nodes in the cluster environment Smooth, non-disruptive capacity expansion and reduction Services to effectively manage large and growing quantities of unstructured data 05/14/ Correct branding usage for Elastic Storage * We have announced new code name: Elastic Storage software. * We are not using 'IBM' in front of 'Elastic Storage' as this is not a formally named offering * In general, only use "code name: Elastic Storage" in the first usage, and then simply 'Elastic Storage' in subsequent communication * Never use "based on GPFS technology" in a slide title. It can be as a bullet on a page, or more ideally, in a footnote * We should not be using "GPFS" as an offering name

2 Clustered and Distributed File Systems
Clustered file systems File system shared by being simultaneously mounted on multiple servers accessing the same storage Examples: IBM Spectrum Scale, Oracle Cluster File System (OCFS2), Global File System (GFS2), Lustre Distributed file systems File system is accessed through a network protocol and do not share block level access to the same storage Examples: NFS, OpenAFS, CIFS A clustered file system is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system (only direct attached storage for each node). Clustered file systems can provide features like location-independent addressing and redundancy which improve reliability or reduce the complexity of the other parts of the cluster. Parallel file systems are a type of clustered file system that spread data across multiple storage nodes, usually for redundancy or performance. Key differences vs. distributed files systems: Cluster file systems provide access from multiple nodes over a SAN Direct access from the computer using a SAN provides performance advantages over standard network protocols. Cluster file systems are tightly coupled and communicate at a more sophisticated level to enable an application, e.g. to have multiple nodes reading and writing to a single file. OCFS2: Oracle Cluster File System 2 is a general-purpose shared-disk cluster file system from Oracle for Linux capable of providing both high performance and high availability. The file system is currently being used in virtualization (Oracle VM) in both the management domain, to host virtual machine images, and in the guest domain, to allow Linux guests to share a file system. It is also being used in database clusters (Oracle RAC), middleware clusters (Oracle E-Business Suite), etc. GFS2: Global File System 2 is a shared disk file system for Linux computer clusters. GFS2 differs from distributed file systems (such as AFS) because GFS2 allows all nodes to have direct concurrent access to the same shared block storage. NFS: Network file system is a protocol developed by Sun Microsystems which allows the access to files in a network. The data is not transferred, the user can access the data as if they were stored on the local hard disk. OpenAFS: Open-Source-implementation of the Andrew File System offers a client-server architecture for federated file sharing and replicated read-only content distribution, providing location independence, scalability, security, and transparent migration capabilities. AFS is available for a broad range of heterogeneous systems including UNIX, Linux, and Windows. CIFS: Common Internet File System is a network protocol for the Remote File-Access. It is the successor to the SMB protocol and is supported by most Windows servers and many other commercial servers and Network Attached Storage appliances as well as by the popular Open Source server Samba. IBM Spectrum Scale - General overview

3 Virtualization Management
IBM Spectrum Scale IBM’s shared disk, parallel cluster file system Hardware resources Virtualization Management Linux Cluster: Fast reliable communication, common admin domain Shared disk: all data and metadata on storage devices accessible from any node through block I/O interface (“disk”: any kind of block storage device) Parallel: data and metadata flow from all of the nodes to all of the disks in parallel. Features: Native encryption, compression, native protocols, cloud tiering, disaster recovery, native raid A cluster file system can be accessed from many computers at the same time over a network or a SAN. The file system is built from a collection of arrays which contain the file system data and metadata. A file system can be built from a single disk or contain thousands of disks storing petabytes of data. Each file system is accessible from all nodes in the cluster. Applications can access files through standard file system interfaces or through enhanced interfaces available for parallel programs. Applications can concurrently read or update a common file from multiple nodes in the cluster. GPFS maintains the coherency and consistency of the file system using sophisticated locking management and logging tools. Elastic Storage provides unparalleled performance for large data objects and for large aggregates of smaller objects. It achieves high performance I/O by: Striping data across multiple disks attached to multiple nodes. High performance metadata (inode) scans. Supporting a wide range of file system block sizes, configurable by the administrator, to match I/O requirements. Utilizing advanced algorithms that improve read-ahead and write-behind IO operations. Using block level locking based on a very sophisticated scalable token management system to provide data consistency while allowing multiple application nodes concurrent access to the files. All data stored in the file system is stripped across all of the disks within a storage pool. This wide data striping allows for optimal performance for the available storage. GPFS provides unmatched performance and reliability with scalable access to critical file data. IT distinguishes itself from other cluster file systems by providing concurrent high-speed file access to applications executing on multiple nodes of a homogenous or heterogeneous cluster.

4 IBM Spectrum Scale - The Industry Standard for High Performance, Scalable Storage
Over 17 years in the marketplace Broad adoption across many industries Technical computing - Government, Educational Enterprise computing - Business Analytics, Financial Services, Electronic Design Automation, Life Sciences, Oil & Gas Exploration, Media Over 3,000 customers world-wide Growing in double digits year to year Industry proven, minimizing risk

5 Spectrum Scale Pioneered Big Data Management
File system 264 files per file system Maximum file system size: 299 bytes Maximum file size equals file system size Customer with 25 PB file system Nodes 1 to 16,384 Nodes Extreme Scalability Performance High Performance Metadata Striped Data Parallel file access protocol Integrated Tiered storage Commodity hardware Proven Reliability Shipping since 1998 Built-in cluster availability automatic failover/failback Data replication Add/remove on the fly Nodes Storage/Disks Rolling Upgrades Administer from any node 5

6 Spectrum Scale Enables An Extremely Flexible Architecture
Tokyo NY 100 = TB 100 = TB 20 = PB 50 = PB Case 1 2 servers with Spectrum Scale Software Fronting 100 TB storage Case 2: Add Performance Add compute nodes Faster Network Case 3: Increase Capacity Add any storage Virtually unlimited scaling Case 4: Global Share Use Active File Management to expand your global namespace Now, let’s play with the arch a bit. This will demonstrate the flexibility of a software defined storage architecture. Adding new devices (disks or nodes) has no negative impact on data mapping, and new drives and nodes can be added without creating access bottlenecks or hot spots, making system management that much easier. IBM and Business Partner Use Only

7 Spectrum Scale Features & Applications
Standard file system interface with POSIX semantics Metadata on shared storage Distributed locking for read/write semantics Highly scalable High capacity (up to 299 bytes file system size, up to 263 files per file system) High throughput (TB/s) Wide striping Large block size (up to 16MB) Multiple nodes write in parallel Advanced data management Snapshots, storage pools, ILM (filesets, policy) Backup HSM (DMAPI) Remote replication, WAN caching High availability Fault tolerance (node, disk failures) On-line system management (add/remove nodes, disks, ...) Hardware resources Virtualization Management Linux Standard file system interface with POSIX semantics Metadata on shared storage Same on-disk structure as regular files, but...not directly visible to the user (don’t appear in any directory) Distributed locking for read/write semantics Distributed locking to synchronize updates to file system metadata on disk to prevent corruption, maintain cache consistency of data and metadata cached in memory on different nodes. Highly scalable High capacity (up to 299 bytes file system size, up to 263 files per file system) High throughput (TB/s) Wide striping Large block size (up to 16MB) Multiple nodes write in parallel Advanced data management Elastic Storage can help to achieve data lifecycle management efficiencies through policy-driven automation and tiered storage management. The use of storage pools, filesets and user-defined policies provide the ability to better match the cost of the storage to the value of the data. Storage pools allow to manage groups of disks within a file system. Using storage pools allows to create tiers of storage by grouping disks based on performance, locality or reliability characteristics. Snapshots, storage pools, Information Life-cycle Management (file sets, policy): Not all storage is the same: some is faster, cheaper, more reliable; Not all data are the same: some are more valuable, important, popular, … Snapshot: Logical, read-only copy of the file system at a point in time. Typical uses: Backup: obtain consistent state of the file system, On-line access to previous file system state Fileset: A partition of the file system name space (sub-directory tree); Allows administrative operations at finer granularity than entire file system, e.g., disk space limits, user/group quota, snapshots, caching, ... Storage Pool: A named collection of disks with similar attributes intended to hold similar data Policy: A set of user-specified rules that match data to the appropriate pool. E.g.: a migration policy can move data between pools, changes replication, delete data, or run arbitrary user commands. Automated, policy-driven storage management can reduce storage costs up to 90 percent. Backup HSM (DMAPI) (HSM - Hierachical Storage Manager; DMAPI - Data Management API): External storage managed by Data Management Application (e.g., Tivoli HSM) that interacts with the file system through a standardized interface (DMAPI) Move infrequently accessed data to external storage (e.g., tape) Restore transparently on-demand when needed Remote replication, WAN caching: Enable sharing file systems between nodes in different clusters, suitable for less reliable, higher-latency networks High availability Elastic Storage is fault tolerant and can be configured for continued access to data even if cluster nodes or storage systems fail. This is accomplished though robust clustering features and support for data replication. Elastic Storage includes the complete set of node availability components to handle data consistency and availability. In an Elastic Storage cluster all nodes see all data and all cluster operations can be done by any node in the cluster with a server license. This means that there are no special nodes in an Elastic Storage cluster. All nodes are capable of performing all tasks. The file system can be configured that it remains available automatically if a disk or server fails. Elastic Storage can be configured to automatically recover from node, storage and other infrastructure failures. Fault tolerance (node, disk failures) On-line system management (add/remove nodes, disks, ...)

8 Flexible Topologies for GPFS Cluster Configuration
Network Shared Disk (NSD) Server Model Shared Nothing Cluster Model Storage Area Network (SAN) Model Storage TCP/IP or Infinband RDMA Network Storage Network TCP/IP Network TCP/IP or Infinband Network I/O Servers Application Nodes GPFS supports 3 topologies – SAN, NSD, and SNC In all three configurations, all the features of the global file system are maintained – global name space and sharing Software enables this flexibility !!!! 8

9 What Spectrum Scale is NOT
Not a client-server file system like NFS, CIFS or AFS TCP/IP Network File Server Client Nodes Storage No single-server performance and bottleneck scaling limits data data metadata Network Metadata Server No single-server performance and bottleneck scaling limits When large numbers of clients want to access the data, or if the data set grows too large, a NFS server quickly becomes the bottleneck and significantly impacts system performance because the NFS server sits in the data path between the client computer and the physical storage devices. An Elastic Storage file system can be built from a single disk or contain thousands of disks storing petabytes of data. Each file system is accessible from all nodes in the cluster. No centralized metadata server Elastic Storage provides scalable metadata management by allowing all nodes of the cluster accessing the file system to perform file metadata operations. This key and unique feature distinguishes Elastic Storage from other cluster file systems which typically have a centralized metadata server handling fixed regions of the file namespace. A centralized metadata server can often become a performance bottleneck for metadata intensive operations and can represent a single point of failure. Elastic Storage solves this problem by managing metadata at the node which is using the file or in the case of concurrent access to the file, at a dynamically selected node which is using the file. Metadata No centralized metadata server Data Data

10 The Logical File System View
IBM Spectrum Scale Simple. Powerful. Economical. Maximum file system size of one million yottabytes . . . One big file system or you can divide into as many as 256 smaller file systems Each file system can be further divided into fileset containers (tree branches) A rule can apply to any file being created or only to files being created within a specific fileset or group of filesets. Define soft or hard quota by user, group or fileset FS1 FS2 FS256 Spectrum Scale policy based data migration File1 File1 File1 File2 File2 File2 File3 File3 File3 File3 File4 IBM and Business Partner Use Only

11 Enhanced protocol support
The challenge: How can I share my storage infrastructure across all of my legacy and new generation applications? The Solution The new IBM Spectrum Scale Protocol Node allows access to data stored in a Spectrum Scale filesystem, using additional access methods and protocols. The Protocol Node functions are clustered and can support transparent failover for NFS and SWIFT protocols as well as SMB protocols. Multiprotocol data access from other systems using the following protocols NFS v3 and v4 SMB 2 and SMB 3.0 mandatory features / CIFS for Windows support. OpenStack Swift and S3 API support for object storage. Our next theme for this launch is enhanced protocol support. The challenge: How can I share my storage infra across both legacy and new gen apps. The solution: Our new Spectrum Scale protocol node now allows access to shared storage from Linux and AIX systems (NFS v3 and v4), Windows (SMB 2 and SMB 3) and object access via OpenStack Swift.

12 The Solution: IBM Spectrum Scale™ brings it all together
IBM Spectrum Scale™ replaces HDFS and NAS file storage Full Hadoop interfaces for Map/Reduce analytics processing No transfer or ingest required as the data is already there Fully protected with Backup Software File-level access support for NFS, CIFS, FTP, SCP and HTTPS Supports Enterprise File Sync-and-Share via OwnCloud or Funambol IBM Spectrum Scale™ replaces SAN-based file systems Replaces NTFS, EXT4, JFS2 and other POSIX file systems Used by over 200 of the top 500 supercomputers No file transfers required between different OS Can be used with everything from databases to video streaming For x86, POWER and z System servers Secure with Data-at-rest encryption IBM Spectrum Scale™ offers Object access Object-level access based on OpenStack Swift driver and Amazon S3 APIs Global Name Space IBM Spectrum Scale™ supports all media Spans flash, disk and tape media

13 Unleash New Storage Economics on a Global Scale
Client workstations Users and applications Compute Farm Single name space SMB/CIFS OpenStack POSIX Map Reduce Connector Site A Site B Site C Cinder Swift NFS Manila Glance IBM Spectrum Scale Automated data placement and data migration Off Premise Tape Flash Storage Rich Servers Multi-cloud Storage Toolkit Disk

14 Backup

15 Why is Spectrum Scale of interest to customers
Why is Spectrum Scale of interest to customers? What problems does Spectrum Scale solve? Insufficient Capacity and Performance Spectrum Scale-based file servers can scale to enormous performance and capacity, avoiding storage islands and staying easy to manage. Unreliable Storage Spectrum Scale based file systems can survive failures of many components without incurring data loss and while remaining available. It has techniques for monitoring components and recovering from failures extremely quickly. Cost Escalation By avoiding storage islands much cost is saved. Management cost, cost of application downtime, cost of over/under provisioning storage, all can be reduced or eliminated. Additional cost can be saved by having Spectrum Scale automatically move files to cheaper disks or even to tape. Geographically distributed data sharing Spectrum Scale lets organizations securely share data across different branches and locations, providing remote access and automated distribution getting maximum value from data organization-wide. IBM Spectrum Scale - General overview

16

17

18

19 Supports a Wide Range of Hardware and Software
Operating systems Hardware platforms Linux® Red Hat Enterprise Linux v6 / 7 SUSE Linux Enterprise Server v11/12 Debian v6 / 7 Ubuntu 14.04/16.04 IBM AIX® v6.1 / 7.1 Windows® Server 2008 x64 (SP2); Windows® Server 2008 R2; Windows 7 x64 SP1; Windows Server 2012 R2; Windows 8.1 IBM Power Big Endian IBM Power Little Endian x86_64 z Systems Including software and 3rd party application software support. IBM takes a comprehensive view of IT operations. One of the primary advantages of working with IBM is an integrated approach to your Elastic Storage solution. IBM Scale Out Network Attached Storage IBM Smart Business Storage Cloud IBM Information Archive IBM Storwize V7000 Unified Storage IBM Smart Analytics System IBM DB2 pureScale IBM Systems for SAP HANA IBM SmartCloud Enterprise IBM SmartCloud Archive IBM Digital Media Center Information Life cycle management SAP BI Accelerator SAP/Business Objects Oracle SAS Ab Initio Informatica SAF ... Storage IBM Storage and storage hardware from all vendors such as: EMC, Hitachi, Hewlett Packard, DDN IBM Spectrum Scale - General overview 19

20 IBM Spectrum Scale benefits over NAS
Better performance Eliminate hotspots with massively parallel access to files Sequential I/O with ES greater than 400 GB/s Throughput advantage for parallel streaming workloads, e.g. Tech Computing and Analytics More Storage. More Files. Hyper Scale. Simplified Management Easier management with one global namespace instead of managing islands of NAS arrays, e.g. no need to copy data between compute clusters Integrated policy driven automation Fewer storage administrators required Lower Cost Optimizes storage tiers including flash, disk and tape Increased efficiency and more efficient provisioning due to parallelization and striping technology Remove duplicate copies of data, e.g. run analytics on one copy of data without having to set up a separate silo

21 Object Store vs. File System
Interface Web-based: GET/PUT/DELETE RESTful: Stateless Metadata Synchronization Eventual Consistency No distributed locking Software Defined Storage Commodity hardware Designed to fault but never fail Built to auto-recover by design Features Basic services that scale (KISS) SW extendible with web interfaces Interface POSIX: Open/Seek/Read/Write/Close Stateful Synchronization Strict Consistency Uses distributed locking Hardware and Software Best of breed hardware Designed not to fault Admin controlled recovery Features Abundant enterprise features built into the products REST: Representational State Transfer, a Web Service API

22 The History of Spectrum Scale
This infographic is the genealogy of IBM Spectrum Scale, from it’s birth as a digital media server and HPC research project to it’s place as a foundational element in the IBM Spectrum Storage family. It highlights key milestones in the product history, usage, and industry to convey that Spectrum Scale may have started as GPFS, but it is so much more now. IBM has invested in the enterprise features that make it easy to use, reliable and suitable for mission critical storage of all types. This marked a major milestone in a long history of development. Those core features mentioned in the Press Release demonstrate the commitment IBM has to moving software-defined storage into the center of the enterprise. And, the pace of change is accelerating. Footnote goes here

23 Resources ibm.com: Public Wiki: IBM Knowledge Center:
ibm.com/systems/platformcomputing/products/gpfs/ Public Wiki: ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General Parallel File System (GPFS) IBM Knowledge Center: ibm.com/support/knowledgecenter/SSFKCN/gpfs_welcome.html?lang=en Data sheet: IBM General Parallel File System (GPFS) Version 4.1 ibm.com/common/ssi/cgi-bin/ssialias?subtype=SP&infotype=PM&appname=STGE_DC_ZQ_USEN&htmlfid=DCD12374USEN&attachment=DCD12374USEN.PDF Presentation: will be published soon Spectrum Scale Quick Install for Linux on IBM System z IBM Internal: Spectrum Scale Sales Wiki: Sales Wiki/page/Elastic Storage Sales Wiki GPFS Quick Reference Guide: bin/ssialias?subtype=ST&infotype=SA&appname=STGI_DC_ZQ_USEN&htmlfid=DCY12364USEN&attachment=DCY12364USEN.PDF GPFS Conversation Starter: bin/ssialias?subtype=RG&infotype=PM&appname=STGI_DC_ZQ_USEN&htmlfid=DCO01637USEN&attachment=DCO01637USEN.PPT Presentation: Selling General Parallel File System (GPFS™): bin/ssialias?subtype=PS&infotype=SA&appname=STGI_DC_ZQ_USEN&htmlfid=DCP03210USEN&attachment=DCP03210USEN.PPT 01.sso.ibm.com/learning/registry/assets/LearningTechnologies/ltu43563?sourceUrl=http%3A%2F%2Flt.be.ibm.com%2Fstg%2Fltu43563 Technical overview of GPFS - GPFS Tutorial 2013 (presentations from Frank Schmuck): IBM Internal Only

24 Easier to Try Spectrum Scale virtual machine
Turn-key Spectrum Scale VM available for download Try the latest Spectrum Scale enhancements Full functionality on laptop, desktop or server Incorporate external storage Use for live demonstrations, proof of concepts, education, validate application interoperability Scripted demonstrations External download link Download VM image, quick start guide, explorer guide, advanced user guide Limitations VirtualBox hypervisor only Type-2 Hypervisor limits performance Not supported for production workloads Can not be migrated to bare metal Easier to try. This release will also feature a completely functional virtual machine as a complementary download. Designed specifically to install on a laptop, desktop or server, you can see the latest UI, try out the unified file & object storage, test HDFS transparency or just run the smallest little version Limited Beta today: Internal Link: GA 11/20:


Download ppt "IBM Spectrum Scale (formerly GPFS)"

Similar presentations


Ads by Google