Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multi-Data-Center Hadoop in a Snap Dr. Konstantin Boudnik Vice President, Open Source Development.

Similar presentations


Presentation on theme: "Multi-Data-Center Hadoop in a Snap Dr. Konstantin Boudnik Vice President, Open Source Development."— Presentation transcript:

1 Multi-Data-Center Hadoop in a Snap Dr. Konstantin Boudnik Vice President, Open Source Development

2 My background ● 15 years Sun Microsystems veteran: JVM, distributed systems ● Vice President, Apache Bigtop ● Committer, PMC & contributor to various ASF projects ● Member of Apache IPMC ● Early Hadoop committer

3 3 WANdisco Background WANdisco: Wide Area Network Distributed Computing –Enterprise ready, high availability software solutions that enable globally distributed organizations to meet today’s data challenges of secure storage, scalability and availability Leader in tools for software engineers – Subversion –Apache Software Foundation sponsor Highly successful IPO, London Stock Exchange, June 2012 (LSE:WAND) US patented active-active replication technology granted, November 2012 Global locations –San Ramon (CA) –Chengdu (China) –Tokyo (Japan) –Boston (MA) –Sheffield (UK) –Belfast (UK)

4 Customers

5 Non-Stop Hadoop Non-Intrusive Plugin Provides Continuous Availability In the LAN / Across the WAN Active/Active

6 3 Key Problems For Multi Cluster Hadoop LAN / WAN

7 Enterprise Ready Hadoop Characteristics of Mission Critical Applications Require 100% Uptime of Hadoop –SLA’s, Regulatory Compliance Require HDFS to be Deployed Globally –Share Data Between Data Centers –Data is Consistent and Not Eventual Ease Administrative Burden –Reduce Operational Complexity –Simplify Disaster Recovery –Lower RTO/RPO Allow Maximum Utilization of Resource –Within the Data Center –Across Data Centers

8 Single Standby Inefficient utilization of resource –Journal Nodes –ZooKeeper Nodes –Standby Node Performance Bottleneck Still tied to the beeper Limited to LAN scope Active / Active All resources utilized –Only NameNode configuration –Scale as the cluster grows –All NameNodes active Load balancing Set resiliency (# of active NN) Global Consistency Breaking Away from Active/Passive What’s in a NameNode

9 Standby Datacenter Idle Resource –Single Data Center Ingest –Disaster Recovery Only One way synchronization –DistCp Error Prone –Clusters can diverge over time Difficult to scale > 2 Data Centers –Complexity of sharing data increases Active / Active DR Resource Available –Ingest at all Data Centers –Run Jobs in both Data Centers Replication is Multi-Directional –active/active Absolute Consistency –Single HDFS spans locations ‘N’ Data Center support –Global HDFS allows appropriate data to be shared Breaking Away from Active/Passive What’s in a Data Center

10 One Cluster Aproach Example Applications –HBASE –RT Query –Map Reduce Poor Resource Management –Data Locality Issues –Network Use –Complex Multiple Clusters

11 Creating Multiple Clusters Example Applications –HBASE –RT Query –Map Reduce Need to share data between clusters –DistCp / Stale Data –Inefficient use of storage and or network –Some clusters may not be available Multiple Clusters

12 Cluster Zones Zoning for Optimal Efficiency 1 100% HDFS Consistency

13 Multi Datacenter Hadoop Disaster Recovery WAN REPLICATION Absolute Consistency Maximum Resource Use Lower Recovery Time/Point Replicate Only What You Want Better Utilization of Power/Cooling Lower TCO LAN Speed Performance

14 Architecture of a Non-Stop Hadoop

15 Technical Use Cases Eliminate Performance Bottleneck –HBASE issues Multi Data-Center Ingest –Information doesn't need to be sent to one DC and then copied back to the other using DistCP –Parallel ingest methods don’t require redirected data streams –Ingest data at, or close to the source –Global Analysis (Logs, Click Streams, etc…) Cluster Zones –Efficient use of resource based on application profile –HBASE, MapReduce, SPARK, etc… Maximize Data Center Resource Utilization –All datacenters can be used to run different jobs concurrently Disaster Recovery –Data is as current as possible (no periodic synchs) –Virtually zero downtime to recover from regional data center failure –Regulatory compliance

16 Non-Stop Hadoop Demonstration

17 Q & A

18 Thank you


Download ppt "Multi-Data-Center Hadoop in a Snap Dr. Konstantin Boudnik Vice President, Open Source Development."

Similar presentations


Ads by Google