Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Slides:



Advertisements
Similar presentations
Data Storage Solutions Module 1.2. Data Storage Solutions Upon completion of this module, you will be able to: List the common storage media and solutions.
Advertisements

Tom Hamilton – America’s Channel Database CSE
NAS vs. SAN 10/2010 Palestinian Land Authority IT Department By Nahreen Ameen 1.
Network Storage and Cluster File Systems Jeff Chase CPS 212, Fall 2000.
Beyond NAS and SAN: The Evolution of Storage Marc Farley Author Building Storage Networks.
Introduction to DBA.
Chapter 5: Server Hardware and Availability. Hardware Reliability and LAN The more reliable a component, the more expensive it is. Server hardware is.
© 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice InfiniBand Storage: Luster™ Technology.
Network-Attached Storage
2 June 2015 © Enterprise Storage Group, Inc. 1 The Case for File Server Consolidation using NAS Nancy Marrone Senior Analyst The Enterprise Storage Group,
Storage area Network(SANs) Topics of presentation
SQL Server, Storage And You Part 2: SAN, NAS and IP Storage.
Midterm 2: April 28th Material:   Query processing and Optimization, Chapters 12 and 13 (ignore , 12.7, and 13.5)   Transactions, Chapter.
Module – 7 network-attached storage (NAS)
Storage Management Module 5.
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
Session 3 Windows Platform Dina Alkhoudari. Learning Objectives Understanding Server Storage Technologies Direct Attached Storage DAS Network-Attached.
Data Storage Willis Kim 14 May Types of storages Direct Attached Storage – storage hardware that connects to a single server Direct Attached Storage.
Product Manager Networking Infrastructure Choices for Storage.
Hardening Linux for Enterprise Applications Peter Knaggs & Xiaoping Li Oracle Corporation Sunil Mahale Network Appliance Session id:
Simplify your Job – Automatic Storage Management Angelo Session id:
© 2009 Oracle Corporation. S : Slash Storage Costs with Oracle Automatic Storage Management Ara Vagharshakian ASM Product Manager – Oracle Product.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Storage Area Networks The Basics. Storage Area Networks SANS are designed to give you: More disk space Multiple server access to a single disk pool Better.
1 Copyright © 2009, Oracle. All rights reserved. Exploring the Oracle Database Architecture.
Object-based Storage Long Liu Outline Why do we need object based storage? What is object based storage? How to take advantage of it? What's.
SANPoint Foundation Suite HA Robert Soderbery Sr. Director, Product Management VERITAS Software Corporation.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
Module 10 Configuring and Managing Storage Technologies.
GeoVision Solutions Storage Management & Backup. ๏ RAID - Redundant Array of Independent (or Inexpensive) Disks ๏ Combines multiple disk drives into a.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
Oracle10g RAC Service Architecture Overview of Real Application Cluster Ready Services, Nodeapps, and User Defined Services.
Clustering  Types of Clustering. Objectives At the end of this module the student will understand the following tasks and concepts. What clustering is.
Best Practices for Backup in SAN/NAS Environments Jeff Wells.
Hosted by NAS and Gateways Considerations for File Storage Randy Kerns Copyright © All Rights Reserved Evaluator Group, Inc E. Belleview Avenue.
Database Edition for Sybase Sales Presentation. Market Drivers DBAs are facing immense time pressure in an environment with ever-increasing data Continuous.
Copyright © 2013 Scale Abilities Ltd … to Oracle 11.2 on Linux From Oracle 11.1 on Solaris …
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
SESSION CODE: BIE07-INT Eric Kraemer Senior Program Manager Microsoft Corporation.
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
Oracle RAC and Linux in the real enterprise October, 02 Mark Clark Director Merrill Lynch Europe PLC Global Database Technologies October, 02 Mark Clark.
Anton TopurovIT-DB 23 April 2013 Introduction to Oracle2.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Achieving Scalability, Performance and Availability on Linux with Oracle 9iR2-RAC Grant McAlister Senior Database Engineer Amazon.com Paper
IST Storage & Backup Group 2011 Jack Shnell Supervisor Joe Silva Senior Storage Administrator Dennis Leong.
Using NAS as a Gateway to SAN Dave Rosenberg Hewlett-Packard Company th Street SW Loveland, CO 80537
1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.
Srik Raghavan Principal Lead Program Manager Kevin Cox Principal Program Manager SESSION CODE: DAT206.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
Process Architecture Process Architecture - A portion of a program that can run independently of and concurrently with other portions of the program. Some.
ASM General Architecture
VMware vSphere Configuration and Management v6
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Implementation of a reliable and expandable on-line storage for compute clusters Jos van Wezel.
MDC323B SMB 3 is the answer Ned Pyle Sr. PM, Windows Server
Oracle 10g Automatic Storage Management Overview of ASM as a Storage Option for Oracle 10g.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Accelerating High Performance Cluster Computing Through the Reduction of File System Latency David Fellinger Chief Scientist, DDN Storage ©2015 Dartadirect.
Copyright ©2003 Dell Inc. All rights reserved. Scaling-Out with Oracle® Grid Computing on Dell™ Hardware J. Craig Lowery, Ph.D. Software Architect and.
Database CNAF Barbara Martelli Rome, April 4 st 2006.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
REMINDER Check in on the COLLABORATE mobile app Best Practices for Oracle on VMware - Deep Dive Darryl Smith Chief Database Architect Distinguished Engineer.
2 Copyright © 2006, Oracle. All rights reserved. RAC and Shared Storage.
An Introduction to GPFS
Storage Area Networks The Basics.
Nexsan iSeries™ iSCSI and iSeries Topologies Name Brian Montgomery
Failover and High Availability
Introduction to Networks
ASM-based storage to scale out the Database Services for Physics
Oracle Architecture Overview
Presentation transcript:

Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc

The Un-”Show Stopper” NAS for Oracle is not “file serving”, let me explain… Think of GbE NFS I/O paths from Oracle Servers to the NAS device that are totally direct. No VLANing sort of indirection. –In these terms, NFS over GbE is just a protocol as is FCPover Fiber Channel –The proof is in the numbers. A single dual-socket/dual-core ADM server running Oracle10gR2 can push through 273MB/s of large I/Os (scattered reads, direct path read/write, etc) of triple-bonded GbE NICs! Compare that to infrastructure and HW costs of 4GbE FCP (~450MB/s, but you need 2 cards for redundancy) –OLTP over modern NFS with GbE is not a challenging I/O profile. However, not all NAS devices are created equal by any means

Agenda Oracle on NAS NAS Architecture Proof of Concept Testing Special Characteristics

Oracle on NAS

Connectivity –Fantasyland Dream Grid™ would be nearly impossible with FibreChannel switched fabric, for instance: 128 nodes == 256 HBAs, 2 switches each with 256 ports just for the servers then you have to work out storage paths Simplicity –NFS is simple. Anyone with a pulse can plug in cat-5 and mount filesystems. –MUCH MUCH MUCH MUCH MUCH simpler than: Raw partitions for ASM Raw, OCFS2 for CRS Oracle Home? Local Ext3 or UFS? What a mess –Supports shared Oracle Home, shared APPL_TOP too –But not simpler than a Certified Third Party Cluster Filesystem, but that is a different presentation Cost –FC HBAs are always going to be more expensive than NICs –Ports on enterprise-level FC switches are very expensive

Oracle on NAS NFS Client Improvements –Direct IO open(,O_DIRECT,) works with Linux NFS clients, Solaris NFS client, likely others Oracle Improvements init.ora filesystemio_options=directIO No async I/O on NFS, but look at the numbers Oracle runtime checks mount options Caveat: It doesn’t always get it right, but at least it tries (OSDS) Don’t be surprised to see Oracle offer a platform-independent NFS client NFS V4 will have more improvements

NAS Architecture

Single-headed Filers Clustered Single-headed Filers Asymmetrical Multi-headed NAS Symmetrical Multi-headed NAS

Single Headed Filer Architecture

NAS Architecture: Single-headed Filer Filesystems /u01 /u02 /u03 GigE Network

Oracle Database Servers Filesystems /u01 /u02 /u03 A single one of these… Has the same (or more) bus bandwidth as this! Oracle Servers Accessing a Single-headed Filer: I/O Bottleneck I/O Bottleneck

Oracle Servers Accessing a Single-headed Filer: Single Point of Failure Oracle Database Servers Filesystems /u01 /u02 /u03 Single Point of Failure Highly Available through failover-HA, DataGuard, RAC, etc

Clustered Single-headed Filers

Architecture: Cluster of Single-headed Filers Filesystems /u01 /u02 Filesystems /u03 Paths Active After Failover

Oracle Servers Accessing a Cluster of Single-headed Filers Filesystems /u01 /u02 Filesystems /u03 Paths Active After Failover Oracle Database Servers

Architecture: Cluster of Single-headed Filers Filesystems /u01 /u02 Filesystems /u03 Paths Active After Failover Oracle Database Servers What if /u03 I/O saturates this Filer?

Filer I/O Bottleneck. Resolution == Data Migration Filesystems /u01 /u02 Filesystems /u03 Paths Active After Failover Oracle Database Servers Filesystems /u04 Migrate some of the “hot” data to /u04

Data Migration Remedies I/O Bottleneck Filesystems /u01 /u02 Filesystems /u03 Paths Active After Failover Oracle Database Servers Filesystems /u04 Migrate some of the “hot” data to /u04 NEW Single Point of Failure

Summary: Single-headed Filers Cluster to mitigate S.P.O.F –Clustering is a pure afterthought with filers –Failover Times? Long, really really long. –Transparent? Not in many cases. Migrate data to mitigate I/O bottlenecks –What if the data “hot spot” moves with time? The Dog Chasing His Tail Syndrome Poor Modularity Expanded by pairs for data availability What’s all this talk about CNS?

Asymmetrical Multi-headed NAS Architecture

FibreChannel SAN … … Three Active NAS Heads / Three For Failover and “Pools of Data” Note: Some variants of this architecture support M:1 Active:Standby but that doesn’t really change much. Oracle Database Servers SAN Gateway

Asymmetrical NAS Gateway Architecture Really not much different than clusters of single-headed filers: –1 NAS head to 1 filesystem relationship –Migrate data to mitigate I/O contention –Failover not transparent But: –More Modular Not necessary to scale up by pairs

Symmetric Multi-headed NAS

HP Enterprise File Services Clustered Gateway

Symmetric vs Asymmetric NAS Head NAS Head NAS Head /Dir1/File1/Dir2/File2/Dir3/File3 /Dir1/File1/Dir2/File2/Dir3/File3 /Dir2/File2 NAS Head NAS Head NAS Head /Dir1/File1 /Dir2/File2 /Dir3/File3 /Dir2/File2 /Dir1/File1 /Dir2/File2 /Dir1/File1 EFS-CG

Enterprise File Services Clustered Gateway Component Overview Cluster Volume Manager –RAID 0 –Expand Online Fully Distributed, Symmetric Cluster Filesystem –The embedded filesystem is a fully distributed, symmetric cluster filesystem Virtual NFS Services –Filesystems are presented through Virtual NFS Services Modular and Scalable –Add NAS heads without interruption –All filesystems can be presented for read/write through any/all NAS heads

EFS-CG Clustered Volume Manager RAID 0 –LUNS are RAID 1, so this implements S.A.M.E. Expand online –Add LUNS, grow volume Up to 16TB –Single Volume

The EFS-CG Filesystem All NAS devices have embedded operating systems and file systems, but the EFS-CG is: –Fully Symmetric Distributed Lock Manager No Metadata Server or Lock Server –General Purpose clustered file system –Standard C Library and POSIX support –Journaled with Online recovery Proprietary format but uses standard Linux file system semantics and system calls including flock() and fcntl() clusterwide Expand a single filesystem online up to 16TB, up to 254 filesystems in current release.

EFS-CG Filesystem Scalability

Scalability. Single Filesystem Export Using x86 Xeon-based NAS Heads (Old Numbers) ,084 1, ,000 1,200 MegaBytes per Second (MB/s) Cluster Size (Nodes) HP StorageWorks Clustered File System is optimized for both READ and WRITE performance. Approximate Single- headed Filer limit NAS Heads

Virtual NFS Services Specialized Virtual Host IP Filesystem groups are exported through VNFS VNFS failover and rehosting are 100% transparent to NFS client –Including active file descriptors, file locks (e.g. fctnl/flock), etc

EFS-CG Filesystems and VNFS

/u01 /u02 NAS Head /u04 /u03 vnfs2b /u03 NAS Head /u01 vnfs1 Enterprise File Services Clustered Gateway /u04 NAS Head /u02 NAS Head /u04 /u03 vnfs1bvnfs3b … Enterprise File Services Clustered Gateway Oracle Database Servers

EFS-CG Management Console

EFS-CG Proof of Concept

Goals –Use Oracle10g ( ) with a single high performance filesystem for the RAC database and measure: –Durability –Scalability –Virtual NFS functionality

EFS-CG Proof of Concept The 4 filesystems presented by the EFS-CG were: –/u01. This filesystems contained all Oracle executables (e.g., $ORACLE_HOME) –/u02. This filesystem contained the Oracle10gR2 clusterware files (e.g., OCR, CSS) and some datafiles and External Tables for ETL testing –/u03. This filesystem was lower-performance space used for miscellaneous tests such as backup disk-to-disk –/u04. This filesystem resided on a high-performance volume that spanned two storage arrays. It contained the main benchmark database

EFS-CG P.O.C. Parallel Tablespace Creation All datafiles created in a single exported filesystem –Proof of multi-headed, single filesystem write scalability

EFS-CG P.O.C. Parallel Tablespace Creation

EFS-CG P.O.C. Full Table Scan Performance All datafiles located in a single exported filesystem –Proof of multi-headed, single filesystem sequential I/O scalability

EFS-CG P.O.C. Parallel Query Scan Throughput

EFS-CG P.O.C. OLTP Testing OLTP Database based on an Order Entry Schema and workload Test areas –Physical I/O Scalability under Oracle OLTP –Long Duration Testing

EFS-CG P.O.C. OLTP Workload Transaction Avg Cost Oracle StatisticsAverage Per Transaction SGA Logical Reads33 SQL Executions5 Physical I/O6.9 * Block Changes8.5 User Calls6 GCS/GES Messages Sent12 * Averages with RAC can be deceiving, be aware of CR sends

EFS-CG P.O.C. OLTP Testing

EFS-CG P.O.C. OLTP Testing. Physical I/O Operations

EFS-CG Handles all OLTP I/O Types Sufficiently—no Logging Bottleneck

Long Duration Stress Test Benchmarks do not prove durability –Benchmarks are “sprints” –Typically minute measured runs (e.g., TPC-C) This long duration stress test was no benchmark by any means –Ramp OLTP I/O up to roughly 10,000/sec –Run non-stop until the aggregate I/O breaks through 10 Billion physical transfers –10,000 physical I/O transfers per second for every second of nearly 12 days

Long Duration Stress Test

Special Characteristics

The EFS-CG NAS Heads are Linux Servers –Tasks can be executed directly within the EFS-CG NAS Heads at FCP speed: –Compression –ETL, data importing –Backup –etc..

Example of EFS-CG Special Functionality A table is exported on one of the RAC nodes The export file is then compressed on the EFS-CG NAS head: –CPU from NAS Head, instead of database servers The NAS heads are really just protocol engines. I/O DMAs are offloaded to the I/O subsysystems. There are plenty of spare cycles. –Data movement at FCP rate instead of GigE Offload the I/O fabric (NFS paths from servers to the EFS-CG)

Export a Table to NFS Mount

Compress it on the NAS Head

Questions and Answers

Backup Slide

EFS-CG NAS Head SAN Ethernet SwitchFiberChannel Switches … 3 GbE NFS Paths: Can be triple bonded, etc EFS-CG Scales “Up” and “Out”