Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc.

Similar presentations


Presentation on theme: "Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc."— Presentation transcript:

1 Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc

2 The Un-”Show Stopper” NAS for Oracle is not “file serving”, let me explain… Think of GbE NFS I/O paths from Oracle Servers to the NAS device that are totally direct. No VLANing sort of indirection. –In these terms, NFS over GbE is just a protocol as is FCPover Fiber Channel –The proof is in the numbers. A single dual-socket/dual-core ADM server running Oracle10gR2 can push through 273MB/s of large I/Os (scattered reads, direct path read/write, etc) of triple-bonded GbE NICs! Compare that to infrastructure and HW costs of 4GbE FCP (~450MB/s, but you need 2 cards for redundancy) –OLTP over modern NFS with GbE is not a challenging I/O profile. However, not all NAS devices are created equal by any means

3 Agenda Oracle on NAS NAS Architecture Proof of Concept Testing Special Characteristics

4 Oracle on NAS

5 Connectivity –Fantasyland Dream Grid™ would be nearly impossible with FibreChannel switched fabric, for instance: 128 nodes == 256 HBAs, 2 switches each with 256 ports just for the servers then you have to work out storage paths Simplicity –NFS is simple. Anyone with a pulse can plug in cat-5 and mount filesystems. –MUCH MUCH MUCH MUCH MUCH simpler than: Raw partitions for ASM Raw, OCFS2 for CRS Oracle Home? Local Ext3 or UFS? What a mess –Supports shared Oracle Home, shared APPL_TOP too –But not simpler than a Certified Third Party Cluster Filesystem, but that is a different presentation Cost –FC HBAs are always going to be more expensive than NICs –Ports on enterprise-level FC switches are very expensive

6 Oracle on NAS NFS Client Improvements –Direct IO open(,O_DIRECT,) works with Linux NFS clients, Solaris NFS client, likely others Oracle Improvements init.ora filesystemio_options=directIO No async I/O on NFS, but look at the numbers Oracle runtime checks mount options Caveat: It doesn’t always get it right, but at least it tries (OSDS) Don’t be surprised to see Oracle offer a platform-independent NFS client NFS V4 will have more improvements

7 NAS Architecture

8 Single-headed Filers Clustered Single-headed Filers Asymmetrical Multi-headed NAS Symmetrical Multi-headed NAS

9 Single Headed Filer Architecture

10 NAS Architecture: Single-headed Filer Filesystems /u01 /u02 /u03 GigE Network

11 Oracle Database Servers Filesystems /u01 /u02 /u03 A single one of these… Has the same (or more) bus bandwidth as this! Oracle Servers Accessing a Single-headed Filer: I/O Bottleneck I/O Bottleneck

12 Oracle Servers Accessing a Single-headed Filer: Single Point of Failure Oracle Database Servers Filesystems /u01 /u02 /u03 Single Point of Failure Highly Available through failover-HA, DataGuard, RAC, etc

13 Clustered Single-headed Filers

14 Architecture: Cluster of Single-headed Filers Filesystems /u01 /u02 Filesystems /u03 Paths Active After Failover

15 Oracle Servers Accessing a Cluster of Single-headed Filers Filesystems /u01 /u02 Filesystems /u03 Paths Active After Failover Oracle Database Servers

16 Architecture: Cluster of Single-headed Filers Filesystems /u01 /u02 Filesystems /u03 Paths Active After Failover Oracle Database Servers What if /u03 I/O saturates this Filer?

17 Filer I/O Bottleneck. Resolution == Data Migration Filesystems /u01 /u02 Filesystems /u03 Paths Active After Failover Oracle Database Servers Filesystems /u04 Migrate some of the “hot” data to /u04

18 Data Migration Remedies I/O Bottleneck Filesystems /u01 /u02 Filesystems /u03 Paths Active After Failover Oracle Database Servers Filesystems /u04 Migrate some of the “hot” data to /u04 NEW Single Point of Failure

19 Summary: Single-headed Filers Cluster to mitigate S.P.O.F –Clustering is a pure afterthought with filers –Failover Times? Long, really really long. –Transparent? Not in many cases. Migrate data to mitigate I/O bottlenecks –What if the data “hot spot” moves with time? The Dog Chasing His Tail Syndrome Poor Modularity Expanded by pairs for data availability What’s all this talk about CNS?

20 Asymmetrical Multi-headed NAS Architecture

21 FibreChannel SAN … … Three Active NAS Heads / Three For Failover and “Pools of Data” Note: Some variants of this architecture support M:1 Active:Standby but that doesn’t really change much. Oracle Database Servers SAN Gateway

22 Asymmetrical NAS Gateway Architecture Really not much different than clusters of single-headed filers: –1 NAS head to 1 filesystem relationship –Migrate data to mitigate I/O contention –Failover not transparent But: –More Modular Not necessary to scale up by pairs

23 Symmetric Multi-headed NAS

24 HP Enterprise File Services Clustered Gateway

25 Symmetric vs Asymmetric NAS Head NAS Head NAS Head /Dir1/File1/Dir2/File2/Dir3/File3 /Dir1/File1/Dir2/File2/Dir3/File3 /Dir2/File2 NAS Head NAS Head NAS Head /Dir1/File1 /Dir2/File2 /Dir3/File3 /Dir2/File2 /Dir1/File1 /Dir2/File2 /Dir1/File1 EFS-CG

26 Enterprise File Services Clustered Gateway Component Overview Cluster Volume Manager –RAID 0 –Expand Online Fully Distributed, Symmetric Cluster Filesystem –The embedded filesystem is a fully distributed, symmetric cluster filesystem Virtual NFS Services –Filesystems are presented through Virtual NFS Services Modular and Scalable –Add NAS heads without interruption –All filesystems can be presented for read/write through any/all NAS heads

27 EFS-CG Clustered Volume Manager RAID 0 –LUNS are RAID 1, so this implements S.A.M.E. Expand online –Add LUNS, grow volume Up to 16TB –Single Volume

28 The EFS-CG Filesystem All NAS devices have embedded operating systems and file systems, but the EFS-CG is: –Fully Symmetric Distributed Lock Manager No Metadata Server or Lock Server –General Purpose clustered file system –Standard C Library and POSIX support –Journaled with Online recovery Proprietary format but uses standard Linux file system semantics and system calls including flock() and fcntl() clusterwide Expand a single filesystem online up to 16TB, up to 254 filesystems in current release.

29 EFS-CG Filesystem Scalability

30 Scalability. Single Filesystem Export Using x86 Xeon-based NAS Heads (Old Numbers) 123 246 493 739 986 1,084 1,196 0 200 400 600 800 1,000 1,200 MegaBytes per Second (MB/s) 12468910 Cluster Size (Nodes) HP StorageWorks Clustered File System is optimized for both READ and WRITE performance. Approximate Single- headed Filer limit NAS Heads

31 Virtual NFS Services Specialized Virtual Host IP Filesystem groups are exported through VNFS VNFS failover and rehosting are 100% transparent to NFS client –Including active file descriptors, file locks (e.g. fctnl/flock), etc

32 EFS-CG Filesystems and VNFS

33 /u01 /u02 NAS Head /u04 /u03 vnfs2b /u03 NAS Head /u01 vnfs1 Enterprise File Services Clustered Gateway /u04 NAS Head /u02 NAS Head /u04 /u03 vnfs1bvnfs3b … Enterprise File Services Clustered Gateway Oracle Database Servers

34 EFS-CG Management Console

35 EFS-CG Proof of Concept

36 Goals –Use Oracle10g (10.2.0.1) with a single high performance filesystem for the RAC database and measure: –Durability –Scalability –Virtual NFS functionality

37 EFS-CG Proof of Concept The 4 filesystems presented by the EFS-CG were: –/u01. This filesystems contained all Oracle executables (e.g., $ORACLE_HOME) –/u02. This filesystem contained the Oracle10gR2 clusterware files (e.g., OCR, CSS) and some datafiles and External Tables for ETL testing –/u03. This filesystem was lower-performance space used for miscellaneous tests such as backup disk-to-disk –/u04. This filesystem resided on a high-performance volume that spanned two storage arrays. It contained the main benchmark database

38 EFS-CG P.O.C. Parallel Tablespace Creation All datafiles created in a single exported filesystem –Proof of multi-headed, single filesystem write scalability

39 EFS-CG P.O.C. Parallel Tablespace Creation

40 EFS-CG P.O.C. Full Table Scan Performance All datafiles located in a single exported filesystem –Proof of multi-headed, single filesystem sequential I/O scalability

41 EFS-CG P.O.C. Parallel Query Scan Throughput

42 EFS-CG P.O.C. OLTP Testing OLTP Database based on an Order Entry Schema and workload Test areas –Physical I/O Scalability under Oracle OLTP –Long Duration Testing

43 EFS-CG P.O.C. OLTP Workload Transaction Avg Cost Oracle StatisticsAverage Per Transaction SGA Logical Reads33 SQL Executions5 Physical I/O6.9 * Block Changes8.5 User Calls6 GCS/GES Messages Sent12 * Averages with RAC can be deceiving, be aware of CR sends

44 EFS-CG P.O.C. OLTP Testing

45 EFS-CG P.O.C. OLTP Testing. Physical I/O Operations

46 EFS-CG Handles all OLTP I/O Types Sufficiently—no Logging Bottleneck

47 Long Duration Stress Test Benchmarks do not prove durability –Benchmarks are “sprints” –Typically 30-60 minute measured runs (e.g., TPC-C) This long duration stress test was no benchmark by any means –Ramp OLTP I/O up to roughly 10,000/sec –Run non-stop until the aggregate I/O breaks through 10 Billion physical transfers –10,000 physical I/O transfers per second for every second of nearly 12 days

48 Long Duration Stress Test

49

50

51

52 Special Characteristics

53 The EFS-CG NAS Heads are Linux Servers –Tasks can be executed directly within the EFS-CG NAS Heads at FCP speed: –Compression –ETL, data importing –Backup –etc..

54 Example of EFS-CG Special Functionality A table is exported on one of the RAC nodes The export file is then compressed on the EFS-CG NAS head: –CPU from NAS Head, instead of database servers The NAS heads are really just protocol engines. I/O DMAs are offloaded to the I/O subsysystems. There are plenty of spare cycles. –Data movement at FCP rate instead of GigE Offload the I/O fabric (NFS paths from servers to the EFS-CG)

55 Export a Table to NFS Mount

56 Compress it on the NAS Head

57 Questions and Answers

58 Backup Slide

59 EFS-CG NAS Head SAN Ethernet SwitchFiberChannel Switches … 3 GbE NFS Paths: Can be triple bonded, etc EFS-CG Scales “Up” and “Out”


Download ppt "Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc."

Similar presentations


Ads by Google