Designing Hadoop for the Enterprise Data Center

Slides:



Advertisements
Similar presentations
IBM Software Group ® Integrated Server and Virtual Storage Management an IT Optimization Infrastructure Solution from IBM Small and Medium Business Software.
Advertisements

Starfish: A Self-tuning System for Big Data Analytics.
Cisco Confidential 1 © 2010 Cisco and/or its affiliates. All rights reserved. CISCO PROPRIETARY.
SDN + Storage.
The Data Center and Hadoop Jacob Rapp, Cisco
Brocade VDX 6746 switch module for Hitachi Cb500
Multi-Data-Center Hadoop in a Snap Dr. Konstantin Boudnik Vice President, Open Source Development.
The future of Unstructured Data Workloads #OFADevWorkshop Tasneem Maistry Pivotal.
1© Copyright 2011 EMC Corporation. All rights reserved. EMC SQL Server Data Warehouse Fast Track Solutions Nov 30, 2011 Ling Wu, EMC.
“It’s going to take a month to get a proof of concept going.” “I know VMM, but don’t know how it works with SPF and the Portal” “I know Azure, but.
Cisco and NetApp Confidential. Distributed under non-disclosure only. Name Date FlexPod Entry-level Solution FlexPod Value, Sized Right for Smaller Workloads.
© 2009 VMware Inc. All rights reserved Big Data’s Virtualization Journey Andrew Yu Sr. Director, Big Data R&D VMware.
Symantec De-Duplication Solutions Complete Protection for your Information Driven Enterprise Richard Hobkirk Sr. Pre-Sales Consultant.
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
/ 2 © Copyright scalar decisions inc. Not for redistribution outside of the intended audience. Cisco Silver Certified Partner  Unified Compute.
All content in this presentation is protected – © 2008 American Power Conversion Corporation Rael Haiboullin System Engineer Capacity Manager.
New Challenges in Cloud Datacenter Monitoring and Management
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
© 2013 Mellanox Technologies 1 NoSQL DB Benchmarking with high performance Networking solutions WBDB, Xian, July 2013.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Edge Based Cloud Computing as a Feasible Network Paradigm(1/27) Edge-Based Cloud Computing as a Feasible Network Paradigm Joe Elizondo and Sam Palmer.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Extreme Networks Confidential and Proprietary. © 2010 Extreme Networks Inc. All rights reserved.
Software to Data model Lenos Vacanas, Stelios Sotiriadis, Euripides Petrakis Technical University of Crete (TUC), Greece Workshop.
XenDesktop Built on FlexPod Flexible IT Infrastructure for Desktop Virtualization.
1 An SLA-Oriented Capacity Planning Tool for Streaming Media Services Lucy Cherkasova, Wenting Tang, and Sharad Singhal HPLabs,USA.
Introduction to Hadoop and HDFS
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
© 2012 IBM Corporation IBM Flex System™ The elements of an IBM PureFlex System.
MDC417 Follow me on Working as Practice Manager for Insight, he is a subject matter expert in cloud, virtualization and management.
VMware View built on FlexPod Flexible IT Infrastructure for Desktop Virtualization.
Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China Dec 19.
GreenSched: An Energy-Aware Hadoop Workflow Scheduler
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Hadoop System simulation with Mumak Fei Dong, Tianyu Feng, Hong Zhang Dec 8, 2010.
1 Making MapReduce Scheduling Effective in Erasure-Coded Storage Clusters Runhui Li and Patrick P. C. Lee The Chinese University of Hong Kong LANMAN’15.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Big Data Directions Greg.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
Use Cases for High Bandwidth Query and Control of Core Networks Greg Bernstein, Grotto Networking Young Lee, Huawei draft-bernstein-alto-large-bandwidth-cases-00.txt.
Accounting for Load Variation in Energy-Efficient Data Centers
Next Generation of Apache Hadoop MapReduce Owen
Bill Shields Sr. Marketing Manager, PMP, June 13, 2015 What Happens When SQL Server 2015 Meets Cisco UCS? Better Business Intelligence.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Embrace the Future of.
Data Centers and Cloud Computing 1. 2 Data Centers 3.
C © 2010 NetApp, Cisco, and VMware. All Rights Reserved. Presented Jointly by Cisco, NetApp, and VMware FlexPod for VMware.
FlexPod Converged Solution. FlexPod is… A prevalidated flexible, unified platform featuring: Cisco Unified Computing System™ Programmable infrastructure.
1 Implementing a Virtualized Dynamic Data Center Solution Jim Sweeney, Principal Solutions Architect, GTSI.
What is Flexpod? Flexpod is a reference architecture for server, storage and networking components that are pretested and validated to work together as.
BIG DATA/ Hadoop Interview Questions.
Jenny Hobbs Consulting Systems Engineer April 2016 Business Case for Tailored Datacenter Integration (TDI)
Microsoft Partner since 2011
Organizations Are Embracing New Opportunities
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Introduction to Distributed Platforms
By Chris immanuel, Heym Kumar, Sai janani, Susmitha
FlexPod Update.
Chapter 10 Data Analytics for IoT
Hadoop and NoSQL at Thomson Reuters
Hadoop Clusters Tess Fulkerson.
Software Engineering Introduction to Apache Hadoop Map Reduce
GGF15 – Grids and Network Virtualization
IS3120 Network Communications Infrastructure
Ministry of Higher Education
Overview Introduction VPS Understanding VPS Architecture
The Basics of Apache Hadoop
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
IBM Power Systems.
Presentation transcript:

Designing Hadoop for the Enterprise Data Center Jacob Rapp, Cisco Eric Sammer, Cloudera

Agenda Hadoop Considerations Integration Multi-tenancy Traffic Types Job Patterns Network Considerations Compute Integration Co-exist with current Data Center infrastructure Multi-tenancy Remove the “Silo clusters”

Data in the Enterprise Data Lives in a confined zone of enterprise repository Long Lived, Regulatory and Compliance Driven Heterogeneous Data Life Cycle Many Data Models Diverse data – Structured and Unstructured Diverse data sources - Subscriber based Diverse workload from many sources/groups/process/technology Virtualized and non-virtualized with mostly SAN/NAS base Customer DB (Oracle/SAP) Soc Media ERP Module B Data Service Sales Pipeline Module A Call Center Product Catalog Catalog Data Video Conf Collab Office Apps Records Mgmt Doc Mgmt B Mgmt A VOIP Exec Reports Scaling & Integration Dynamics are different Data Warehousing(structured) with diverse repository + Unstructured Data Few hundred to thousand nodes, few PB Integration, Policy & Security Challenges Each Apps/Group/Technology limited in data generation Consumption Servicing confined domains Diverse data sources - Subscriber based (Census, Proprietary, Buyers, Manufacturing) © 2009, Cisco Systems, Inc. All rights reserved. Presentation_ID.scr

Enterprise Data Center Infrastructure WAN Edge Layer FC SAN A FC SAN B Nexus 7000 10 GE Core Layer 3 Layer 2 - 1GE Layer 2 - 10GE 10 GE DCB 10 GE FCoE/DCB 4/8 Gb FC MDS 9500 SAN Director Core Layer (LAN & SAN) Nexus 7000 10 GE Aggr vPC+ FabricPath L3 L2 Aggregation & Services Layer Network Services FC SAN A FC SAN B Access Layer SAN Edge Nexus 5500 FCoE MDS 9200 / 9100 B22 FEX HP Blade C-class Nexus 5500 10GE Nexus 2148TP-E Bare Metal CBS 31xx Blade switch Nexus 7000 End-of-Row Nexus 5500 FCoE Nexus 2232 Top-of-Rack UCS FCoE Bare Metal 1G Nexus 3000 Top-of-Rack Nexus 3000 Top-of-Rack 10G 1 GbE Server Access & 4/8Gb FC via dual HBA (SAN A // SAN B) 10Gb DCB / FCoE Server Access or 10 GbE Server Access & 4/8Gb FC via dual HBA (SAN A // SAN B) © 2009, Cisco Systems, Inc. All rights reserved. Presentation_ID.scr

Hadoop Cluster Design & Network Architecture

Validated 96 Node Hadoop Cluster Name Node Cisco UCS C 200 Single NIC Nexus 7000 Data Nodes 1 – 48 Cisco UCS C 200 Single NIC … Data Nodes 49 - 96 Nexus 3000 Nexus 7K-N3K based Topology Name Node Cisco UCS C200 Single NIC 2248TP-E Nexus 5548 Data Nodes 1 – 48 Cisco UCS C 200 Single NIC … Data Nodes 49- 96 Cisco UCS 200 Single NIC Traditional DC Design Nexus 55xx/2248 Hadoop Framework Apache 0.20.2 Linux 6.2 Slots – 10 Maps & 2 Reducers per node Compute – UCS C200 M2 Cores:  12 Processor:  2 x Intel(R) Xeon(R) CPU  X5670 @ 2.93GHz Disk: 4 x 2TB (7.2K RPM) Network: 1G: LOM, 10G: Cisco UCS P81E Network Three Racks each with 32 nodes Distribution Layer – Nexus 7000 or Nexus 5000 ToR – FEX or Nexus 3000 2 FEX per Rack Each Rack with either 32 single or dual attached host © 2009, Cisco Systems, Inc. All rights reserved. Presentation_ID.scr

Hadoop Job Patterns and Network Traffic

Job Patterns Reduce Analyze 1:0.3 Reduce Extract Transform Load 1:1 Ingress vs. Egress Data Set 1:0.3 Analyze The Time the reducers start is dependent on: mapred.reduce.slowstart.completed.maps It doesn’t change the amount of data sent to Reducers, but may change the timing to send that data Reduce Ingress vs. Egress Data Set 1:1 Extract Transform Load (ETL) Reduce Ingress vs. Egress Data Set 1:2 Explode

Traffic Types Small Flows/Messaging Small – Medium Incast Large Flows (Admin Related, Heart-beats, Keep-alive, delay sensitive application messaging) Small – Medium Incast (Hadoop Shuffle) Large Flows (HDFS Ingest) Large Incast (Hadoop Replication)

Many-to-Many Traffic Pattern Map and Reduce Traffic NameNode JobTracker ZooKeeper Many-to-Many Traffic Pattern Map 1 Map 2 Map N Map 3 Reducer 1 Reducer 2 Reducer 3 Reducer N HDFS Shuffle Output Replication © 2009, Cisco Systems, Inc. All rights reserved. Presentation_ID.scr

Extract Transform Load Extract Transform Load Job Patterns Job Patterns have varying impact on network utilization Analyze Simulated with Shakespeare Wordcount Extract Transform Load (ETL) Simulated with Yahoo TeraSort Extract Transform Load (ETL) Simulated with Yahoo TeraSort with output replication

Data Locality in HDFS Data Locality – The ability to process data where it is locally stored. Note: During the Map Phase, the JobTracker attempts to use data locality to schedule map tasks where the data is locally stored. This is not perfect and is dependent on a data nodes where the data is located. This is a consideration when choosing the replication factor. More replicas tend to create higher probability for data locality. Map Tasks: Initial spike for non-local data. Sometimes a task may be scheduled on a node that does not have the data available locally.

Multi-Job Cluster Characteristics Hadoop clusters are generally multi-use. The effect of background use can effect any single job’s completion. A given Cluster, running many different types of Jobs, Importing into HDFS, Etc. Example View of 24 Hour Cluster Use Large ETL Job Overlaps with medium and small ETL Jobs and many small BI Jobs (Blue lines are ETL Jobs and purple lines are BI Jobs) Importing Data into HDFS

Map to Reducer Ratio Impact on Job Completion 1 TB file with 128 MB Blocks == 7,813 Map Tasks The job completion time is directly related to number of reducers Average Network buffer usage lowers as number of reducer gets lower and vice versa.

Network Traffic with Variable Reducers Network Traffic Decreases with Less Reducers available 96 Reducers 48 Reducers 24 Reducers

Summary Running a single ETL or Explode Job Pattern on entire cluster is the most network intensive jobs Analyze Jobs are the least network intensive jobs A mixed environment of multiple jobs is less intensive than one single job due to sharing of resources Large number of reducers can create load on the network, but is dependent on Job Pattern and when reducers start

Integration into the Data Center

Integration Considerations Network Attributes Architecture Availability Capacity, Scale & Oversubscription Flexibility Management & Visibility

Data Node Speed Differences Generally 1G is being used largely due to the cost/performance trade-offs. Though 10GE can provide benefits depending on workload Single 1GE 100% Utilized Dual 1GE 75% Utilized Generally 1G is being used largely due to the cost/performance trade-offs. Though 10GE can provide benefits depending on workload Reduced spike with 10G and smoother job completion time Multiple 1G or 10G links can be bonded together to not only increase bandwidth, but increase resiliency. 10GE 40% Utilized © 2009, Cisco Systems, Inc. All rights reserved. Presentation_ID.scr

Availability Single Attached vs. Dual Attached Node No single point of failure from network view point. No impact on job completion time NIC bonding configured at Linux – with LACP mode of bonding Effective load-sharing of traffic flow on two NICs. Recommended to change the hashing to src-dst-ip-port (both network and NIC bonding in Linux) for optimal load-sharing Talk about intensity of failure with smaller job vs bigger job The MAP job are executed parallel so unit time for each MAP tasks/node remains same and more less completes the job roughly at the same time. However during the failure, set of MAP task remains pending (since other nodes in the cluster are still completing their task) till ALL the node finishes the assigned tasks. Once all the node finishes their MAP task, the left over MAP task being reassigned by name node, the unit time it take to finish those sets of MAP task remain the same(linear) as the time it took to finish the other MAPs – its just happened to be NOT done in parallel thus it could double job completion time. This is the worst case scenario with Terasort, other workload may have variable completion time. © 2009, Cisco Systems, Inc. All rights reserved. Presentation_ID.scr

Buffer Usage During output Replication 1GE vs. 10GE Buffer Usage Moving from 1GE to 10GE actually lowers the buffer requirement at the switching layer. Buffer Usage During Shuffle Phase Buffer Usage During output Replication By moving to 10GE, the data node has a wider pipe to receive data lessening the need for buffers on the network as the total aggregate transfer rate and amount of data does not increase substantially. This is due, in part, to limits of I/O and Compute capabilities

Network Latency Generally network latency, while consistent latency being important, does not represent a significant factor for Hadoop Clusters. Note: There is a difference in network latency vs. application latency. Optimization in the application stack can decrease application latency that can potentially have a significant benefit.

Integration Considerations Findings 10G and/or Dual attached server provides consistent job completion time & better buffer utilization 10G provide reduce burst at the access layer Dual Attached Sever is recommended design – 1G or 10G. 10G for future proofing Rack failure has the biggest impact on job completion time Does not require non-blocking network Latency does not matter much in Hadoop workloads Goals Extensive Validation of Hadoop Workload Reference Architecture Make it easy for Enterprise Demystify Network for Hadoop Deployment Integration with Enterprise with efficient choices of network topology/devices More Details at: http://www.slideshare.net/Hadoop_Summit/ref-arch-validated-and-tested-approach-to-define-a-network-design http://youtu.be/YJODsK0T67A

Multi-tenant Environments

Various Multitenant Environments Need to understand Traffic Patterns Hadoop + HBASE Job Based Department Based Scheduling Dependent Permissions and Scheduling Dependent

Hadoop + Hbase Client Client Map 1 Map 2 Map 3 Map N Region Server Update Read Update Read Map 1 Map 2 Map 3 Map N Region Server Region Server Shuffle Read Read Reducer 1 Reducer 2 Reducer 3 Reducer N Major Compaction Major Compaction Output Replication HDFS © 2009, Cisco Systems, Inc. All rights reserved. Presentation_ID.scr

Hbase During Major Compaction ~45% for Read Improvement Read/Update Latency Comparison of Non-QoS vs. QoS Policy Switch Buffer Usage With Network QoS Policy to prioritize Hbase Update/Read Operations

Hbase + Hadoop Map Reduce Read/Update Latency Comparison of Non-QoS vs. QoS Policy ~60% for Read Improvement Switch Buffer Usage With Network QoS Policy to prioritize Hbase Update/Read Operations

THANK YOU FOR LISTENING Cisco Unified Data Center Cisco.com Big Data www.cisco.com/go/bigdata THANK YOU FOR LISTENING Cisco Unified Data Center UNIFIED FABRIC UNIFIED COMPUTING UNIFIED MANAGEMENT Highly Scalable, Secure Network Fabric Modular Stateless Computing Elements Automated Management Manages Enterprise Workloads www.cisco.com/go/nexus www.cisco.com/go/ucs http://www.cisco.com/go/workloadautomation