© 2013 Mellanox Technologies 1 NoSQL DB Benchmarking with high performance Networking solutions WBDB, Xian, July 2013.

Slides:



Advertisements
Similar presentations
Meet Hadoop Doug Cutting & Eric Baldeschwieler Yahoo!
Advertisements

The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.
1 Agenda … HPC Technology & Trends HPC Platforms & Roadmaps HP Supercomputing Vision HP Today.
A Hadoop Overview. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A.
Performance Analysis of Virtualization for High Performance Computing A Practical Evaluation of Hypervisor Overheads Matthew Cawood University of Cape.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
© 2009 VMware Inc. All rights reserved Big Data’s Virtualization Journey Andrew Yu Sr. Director, Big Data R&D VMware.
Remigius K Mommsen Fermilab A New Event Builder for CMS Run II A New Event Builder for CMS Run II on behalf of the CMS DAQ group.
NWfs A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham.
CON Software-Defined Networking in a Hybrid, Open Data Center Krishna Srinivasan Senior Principal Product Strategy Manager Oracle Virtual Networking.
Workshop on Basics & Hands on Kapil Bhosale M.Tech (CSE) Walchand College of Engineering, Sangli. (Worked on Hadoop in Tibco) 1.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
Intel® Distribution for Apache Hadoop* Ram Lakshminarayan Asia Pac – BDM Datacenter.
© Hitachi Data Systems Corporation All rights reserved. 1 1 Det går pænt stærkt! Tony Franck Senior Solution Manager.
Next Generation of Apache Hadoop MapReduce Arun C. Murthy - Hortonworks Founder and Architect Formerly Architect, MapReduce.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
CERN IT Department CH-1211 Geneva 23 Switzerland t XLDB 2010 (Extremely Large Databases) conference summary Dawid Wójcik.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
© Copyright 2010 Hewlett-Packard Development Company, L.P. 1 HP + DDN = A WINNING PARTNERSHIP Systems architected by HP and DDN Full storage hardware and.
Report : Zhen Ming Wu 2008 IEEE 9th Grid Computing Conference.
Reliable Datagram Sockets and InfiniBand Hanan Hit NoCOUG Staff 2010.
Small File File Systems USC Jim Pepin. Level Setting  Small files are ‘normal’ for lots of people Metadata substitute (lots of image data are done this.
Workload Optimized Processor
HDFS Hadoop Distributed File System
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Our Experience Running YARN at Scale Bobby Evans.
Latest Relevant Techniques and Applications for Distributed File Systems Ela Sharda
Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
© 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect.
© 2012 IBM Corporation IBM Flex System™ The elements of an IBM PureFlex System.
Gilad Shainer, VP of Marketing Dec 2013 Interconnect Your Future.
Update on Scalable SA Project #OFADevWorkshop Hal Rosenstock Mellanox Technologies.
HPC system for Meteorological research at HUS Meeting the challenges Nguyen Trung Kien Hanoi University of Science Melbourne, December 11 th, 2012 High.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
SAN DIEGO SUPERCOMPUTER CENTER SDSC's Data Oasis Balanced performance and cost-effective Lustre file systems. Lustre User Group 2013 (LUG13) Rick Wagner.
Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage April 2010.
Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*
Accelerating High Performance Cluster Computing Through the Reduction of File System Latency David Fellinger Chief Scientist, DDN Storage ©2015 Dartadirect.
Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage September 2010 Brandon.
Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.
Load Rebalancing for Distributed File Systems in Clouds.
Next Generation of Apache Hadoop MapReduce Owen
Introduction to Exadata X5 and X6 New Features
By: Joel Dominic and Carroll Wongchote 4/18/2012.
BIG DATA/ Hadoop Interview Questions.
Peter Idoine Managing Director Oracle New Zealand Limited.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
G. Russo, D. Del Prete, S. Pardi Frascati, 2011 april 4th-7th The Naples' testbed for the SuperB computing model: first tests G. Russo, D. Del Prete, S.
© 2007 Z RESEARCH Z RESEARCH Inc. Non-stop Storage GlusterFS Cluster File System.
Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.
Organizations Are Embracing New Opportunities
Experience of Lustre at QMUL
DSS-G Configuration Bill Luken – April 10th , 2017
Hadoop Aakash Kag What Why How 1.
Introduction to Distributed Platforms
An Open Source Project Commonly Used for Processing Big Data Sets
Couchbase Server is a NoSQL Database with a SQL-Based Query Language
Introduction to HDFS: Hadoop Distributed File System
Hadoop Clusters Tess Fulkerson.
Large Scale Test of a storage solution based on an Industry Standard
Ministry of Higher Education
Big Data - in Performance Engineering
Introduction to Apache
Presentation transcript:

© 2013 Mellanox Technologies 1 NoSQL DB Benchmarking with high performance Networking solutions WBDB, Xian, July 2013

© 2013 Mellanox Technologies 2 Leading Supplier of End-to-End Interconnect Solutions Host/Fabric SoftwareICsSwitches/GatewaysAdapter CardsCables Comprehensive End-to-End InfiniBand and Ethernet Portfolio Virtual Protocol Interconnect Storage Front / Back-End Server / Compute Switch / Gateway 56G IB & FCoIB 56G InfiniBand 10/40/56GbE & FCoE 10/40/56GbE Fibre Channel Virtual Protocol Interconnect

© 2013 Mellanox Technologies 3 Motivation to Accelerate Data Analytics  Data Analysis Requires Faster Network Hadoop Map Reduce Framework is a network intensive workload - Mapped data is shuffled between nodes in the cluster Data Replication - A high availability event triggers Multi-Tera of data movement  Provide Higher Data Value Expose SSD’s low latency capabilities Better server/CPU utilization * Data Source: Intersect360 Research, 2012, IT and Data scientists survey Big Data Applications Require High Bandwidth and Low Latency Interconnect

© 2013 Mellanox Technologies 4  Cassandra Database enables update capabilities  Latency factors Commit-log settings Workload Cassandra, Update Latency

© 2013 Mellanox Technologies 5  Cassandra Database Read  Latency factors Media used Workload Cassandra, Read Latency

© 2013 Mellanox Technologies 6  5 Nodes in the Ring  64GB RAM 8 x 8GB DDR3 1333MHz  2 x E Cores per socket  5 x Seagate® Constellation® ES SATA 6Gb/s 2TB Hard Drive 7200 RPM  NIC: Mellanox Technologies MT27500 Family [ConnectX-3] 10Gb Ethernet FW_VER=  Switch SX1036  OS: RH 6.3 MLNX_OFED_LINUX  Apache Cassandra , 2 seeds System Used for Cassandra Benchmark

© 2013 Mellanox Technologies 7  SSDs Become De-Facto standard in HDFS deployment Read capability is a critical factor for application performance  E-DFSIO, Part of Intel’s HiBench test suite, profiles aggregated throughput on the cluster 1GbE network impede any performance benefit from SSD deployment Unlocking the Power of SSDs In Hadoop Environment E-DFSIO, Showing the Power of HDFS

© 2013 Mellanox Technologies 8  Updates are made to server memory Extreme low latency for HBase - Java GC policy hurting on large throughput HBase Benchmarking, Update Latency

© 2013 Mellanox Technologies 9  Hitting the media capabilities HBase Benchmarking, Read Latency

© 2013 Mellanox Technologies 10  4 Region servers, 1 Master, 3 Zookeeper quorum servers  64GB RAM 8 x 8GB DDR3 1333MHz  2 x E Cores per socket  5 x Seagate® Constellation® ES SATA 6Gb/s 2TB Hard Drive 7200 RPM  NIC: Mellanox Technologies MT27500 Family [ConnectX-3] 10Gb Ethernet FW_VER=  Switch SX1036  OS: RH 6.3 MLNX_OFED_LINUX  Apache Hbase , Zookeeper 3.4.5, Apache Hadoop System Used for HBase Benchmarks

© 2013 Mellanox Technologies 11  EMC 1000-Node Analytic Platform  Accelerates Industry's Hadoop Development  24 PetaByte of physical storage  Mellanox VPI Solutions Test Drive Your Big Data 2X Faster Hadoop Job Run-Time Hadoop Acceleration Hadoop Acceleration High Throughput, Low Latency, RDMA Critical for ROI

© 2013 Mellanox Technologies 12 The Great Things in Hadoop Distributed File System HDFS is a block storage solution Block size can be modified to provide efficient solutions for very large files Inherent reliability, no need for high end storage solution to make sure data is there! Tuned for Hadoop work loads, write one and read many

© 2013 Mellanox Technologies 13 The Less Great Things in HDFS It’s hard to manage the different setting to get the right nodes into the right capabilities. Ingress and extraction of data requires additional tools. Small files or latency sensitiveDefault 3x Replication Metadata Server Failure

© 2013 Mellanox Technologies 14 Local Disks – The Common Practice

© 2013 Mellanox Technologies 15 Other Distributed Storage Solution for Hadoop, Really?!

© 2013 Mellanox Technologies 16 OrangeFS as Hadoop Storage Solution

© 2013 Mellanox Technologies 17 Lustre as Hadoop Storage Solution Source: Map/Reduce on Lustre, Hadoop Performance in HPC Environments, Nathan Rutman, Senior Architect, Networked Storage Solutions, Xyratex

© 2013 Mellanox Technologies 18 CEPH as Hadoop Storage Solution  Generating lot of Interest since the Ceph kernel client was pulled into Linux kernel Object-based parallel file system Scalable metadata server Each file can specify it’s own striping strategy and object size Automatic rebalancing of data with minimal data movement Hadoop module for integrating Ceph has been in development since 0.12 release  Benchmarks on Ceph is still WIP We are currently working on using running benchmarks on Ceph – Stay tuned!!

© 2013 Mellanox Technologies 19 Thank You