Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reliable Datagram Sockets and InfiniBand Hanan Hit NoCOUG Staff 2010.

Similar presentations


Presentation on theme: "Reliable Datagram Sockets and InfiniBand Hanan Hit NoCOUG Staff 2010."— Presentation transcript:

1 Reliable Datagram Sockets and InfiniBand Hanan Hit NoCOUG Staff 2010

2 Agenda 2 Infiniband Basics What is RDS (Reliable Datagram Sockets)? Advantages of RDS over InfiniBand Architecture Overview TPC-H over 11g Benchmark InfiniBand vs. 10GE

3 3 November 11,2010 3 Value Proposition - Oracle Database RAC Oracle Database Real Application Clusters (RAC) provides the ability to build an application platform from multiple systems clustered together Benefits –Performance Increase performance of a RAC database by adding additional servers to the cluster –Fault Tolerance A RAC database is constructed from multiple instances. Loss of an instance does not bring down the entire database –Scalability Scale a RAC database by adding instances to the cluster database

4 Some Facts 4 High-end database applications in the OLTP category are in size range from 10-20 TB with 2-10k IOPS. The high end DW applications falls into the category of 20-40 TB with I/O bandwidth requirement of around 4-8 GB per second. The x86_64 server with 2 sockets seems to offer the best price at the current point. The major limitations of the above servers is limited number of slots available to connect to the external I/O cards and the CPU cost of processing I/O in conventional kernel based I/O mechanisms. The main challenge in building cluster databases that runs in multiple serves is the ability to provide low cost balanced I/O bandwidth. The conventional fiber channel based storage arrays with its expensive plumbing does not scale very well to create the balance where these db servers could be optimally utilized. November 11,2010

5 IBA/Reliable Datagram Sockets (RDS) Protocol 5 What is IBA InfiniBand Architecture (IBA) is an industry-standard, channel-based, switched-fabric, high-speed interconnect architecture with low latency and high throughput. The InfiniBand architecture specification defines a connection between processor nodes and high performance I/O nodes such as storage devices. What is RDS A low overhead, low latency, high bandwidth, ultra reliable, supportable, Inter-Process Communication (IPC) protocol and transport system Matches Oracle’s existing IPC models for RAC communication  Optimized for transfers from 200Bytes to 8MByte Based on Socket API November 11,2010

6 Reliable Datagram Sockets (RDS) Protocol 6 Leverage InfiniBand’s built-in high availability and load balance features Port failover on the same HCA HCA failover on the same system Automatic load balancing Open Source on Open Fabric / OFED http://www.openfabrics.org/downloads/OFED/ofed-1.4/OFED-1.4-docs/ November 11,2010

7 Advantages of RDS over InfiniBand 7 Lowering Data Center TCO requires efficient fabrics Oracle RAC 11g will scale for database intensive applications only with the proper high speed protocol and efficient interconnect RDS over 10GE 10Gbps not enough to feed multi core Server IO needs Each core may require > 3Gbps Packets can be lost and require retransmit Statistics are not accurate throughput indication Efficiency is much lower than reported RDS over InfiniBand The network efficiency is always 100% 40Gbps today Uses Infiniband delivery capabilities that offload end-to-end checking to the Infiniband fabric. Integrated in the Linux kernel More tools will be ported to support RDS, i.e.: netstat, etc. Shows significant real world application performance boost Decision Support System Mixed Batch/OLTP workloads November 11,2010

8 Infiniband considerations 8 Why do Oracle use Infiniband? High bandwidth (1x SDR = 2.5 Gbps, 1x DDR = 5.0 Gbps, 1x QDR = 10.0 Gbps) V2 DB machine uses 4x QDR links (40 Gbps in each direction, simultaneously)‏ Low latency (few µs end-to-end, 160ns per switch hop)‏ RDMA capable Exadata cells recv/send large transfers using RDMA, thus saving CPU for other operations November 11,2010

9 Architecture Overview 9 November 11,2010

10 10 November 11,2010 10 #1 Price/Performance TPC-H over 11g Benchmark 11g over DDR –Servers: 64 x ProLiant BL460c CPU: 2 x Intel Xeon X5450 – Quad-Core –Fabric: Mellanox DDR InfiniBand –Storage: Native InfiniBand Storage –6 x HP Oracle Exadata World Record clustered TPC-H Performance and Price/Performance 11g over 1GE11g over DDR Price / QphH*@1000GB DB $5.00 $10.00 $15.00 $20.00 $25.00 73% TCO Saving

11 11 November 11,2010 11 POC Hardware Configuration Application Servers 2x HP BL480C 2 Processors / 8 core X560 3.16GHz 64GB RAM 4x 72GB 15K drives NIC: HP NC373i 1GB NIC Concurrent Manager Servers 6x HP BL480C 2 Processors / 8 core X560 3.16GHz 64GB RAM 4x 72GB 15K drives NIC: HP NC373i 1GB NIC Database Servers 6x HP DL580 G5 4 processors / 24 cores X7460 2.67GHz 256GB RAM 8x 72GB 15K drives NIC: Intel 10GBE XF SR 2 port PCIe NIC Interconnect: Mellanox 4x PCIe Infiniband Storage Array HP XP24000 64GB cache / 20GB shared memory 60 Array Groups of 4 spindles 240 spindles total 146GB 15K fibre channel disk drives 1GbE Network 10GbE Network Infiniband Network 4Gb Fibre Channel Network Application Servers Concurrent Management Servers Database Servers Storage Array

12 12 November 11,2010 CPU Utilization InfiniBand maximize CPU efficiency –Enables >20% higher than 10GE InfiniBand Interconnect 10GigE Interconnect

13 13 November 11,2010 Disk IO Rate InfiniBand maximizes Disk utilization –Delivers 46% higher IO traffic than 10GE InfiniBand Interconnect 10GigE Interconnect

14 14 November 11,2010 InfiniBand deliver 63% more TPS vs. 10GE ActivityStart TimeEnd TimeDurationRecordsTPS InfiniBand Interconnect 1Invoice Load - Load File6/17/09 7:486/17/09 7:540:06:019,899,63527,422.81 2Invoice Load - Auto Invoice6/17/09 8:006/17/09 9:541:54:219,899,6351,442.89 3Invoice Load – TotalN/A 2:00:229,899,6351,370.76 10 GigE interconnect 1Invoice Load - Load File6/25/09 17:156/25/09 17:200:05:217,196,17122,417.98 2Invoice Load - Auto Invoice6/25/09 18:226/25/09 20:392:17:057,196,171874.91 3Invoice Load – TotalN/A 2:22:267,196,171842.05 Work Load – Nodes 1 through 4: Batch processing – Node 5: Extra Node not used – Node 6: EBS Other Activity Database size (2 TB) – ASM – 5 LUNS @ 400 GB TPS Rates for invoice load use case InfiniBand needs only 6 servers vs. 10 Servers needed by 10GE 10GE InfiniBand 0 200 400 600 800 1000 1200 1400 1600 TPS

15 15 November 11,2010 Sun Oracle Database Machine Clustering is the architecture of the future –Highest performance, lowest cost, redundant, incrementally scalable Sun Oracle Database Machine that based on 40Gb/s InfiniBand delivers a complete clustering architecture for all data management needs

16 16 November 11,2010 Sun Oracle Database Server Hardware 8 Sun Fire X4170 DB per rack 8 CPU cores 72 GB memory Dual-ports 40Gb/s InfiniBand card Fully redundant power and cooling

17 17 November 11,2010 Exadata Storage Server Hardware Building block of massively parallel Exadata Storage Grid –Up to 1.5 GB/sec raw data bandwidth per cell –Up to 75,000 IOPS with Flash Sun Fire™ X4275 Server –2 Quad-Core Intel® Xeon® E5540 Processors –24GB RAM –Dual-port 4X QDR (40Gb/s) InfiniBand card Disk Options12 x 600 GB SAS disks (7.2 TB total) 12 x 2TB SATA disks (24 TB total) –4 x 96 GB Sun Flash PCIe Cards (384 GB total) Software pre-installed –Oracle Exadata Storage Server Software –Oracle Enterprise Linux –Drivers, Utilities Single Point of Support from Oracle –3 year, 24 x 7, 4 Hr On-site response

18 18 November 11,2010 Mellanox 40Gbps InfiniBand Networking Sun Datacenter InfiniBand Switch – 36 Ports QSFP Fully redundant non-blocking IO paths from servers to storage 2.88 Tb/sec bi-sectional bandwidth per switch 40Gb/s QDR, Dual ports per server Highest Bandwidth and Lowest Latency

19 DB machine protocol stack 19 November 11,2010 Infiniband HCA IPoIB RDS TCP/UDP iDB Oracle IPC RAC RDS provides - Zero loss - Zero copy (ZDP)‏ SQL*Net, CSS, etc

20 20 November 11,2010 What's new in V2 2 managed, 2 unmanaged switches 24 port DDR switches 15 second min. SM failover timeout CX4 connectors SNMP monitoring available Cell HCA in x4 PCIe slot 3 managed switches 36 port QDR switches 5 seconds min. SM failover timeout QSFP connectors SNMP monitoring coming soon Cell HCA in x8 PCIe slot V1 DB machine V2 DB machine

21 21 November 11,2010 Infiniband Monitoring SNMP alerts on Sun IB switches are coming EM support for IB fabric coming –Voltaire EM plugin available (at an extra cost)‏ In the meantime, customers can & should monitor using –IB commands from host –Switch CLI to monitor various switch components Self monitoring exists –Exadata cell software monitors its own IB ports –Bonding driver monitors local port failures –SM monitors all port failures on the fabric

22 22 November 11,2010 Scale Performance and Capacity Scalable –Scales to 8 rack database machine by just adding wires More with external InfiniBand switches –Scales to hundreds of storage servers Multi-petabyte databases Redundant and Fault Tolerant –Failure of any component is tolerated –Data is mirrored across storage servers

23 23 November 11,2010 Competitive Advantage “…everybody is using Ethernet, we are using InfiniBand, 40Gb/s InfiniBand” Larry Ellison Keynote at Oracle OpenWorld introducing Exadata-2 (Sun Oracle DB machine), October 14, 2009 San Francisco


Download ppt "Reliable Datagram Sockets and InfiniBand Hanan Hit NoCOUG Staff 2010."

Similar presentations


Ads by Google