Download presentation
Presentation is loading. Please wait.
Published byChristiana Washington Modified over 8 years ago
1
Part III BigData Analysis Tools (YARN) Yuan Xue (yuan.xue@vanderbilt.edu) http://www.slideshare.net/hortonworks/apache-hadoop-yarn-enabling-nex
2
Motivation Review of MapReduce (MR1)
3
Motivation -Limitation of MR1 Scalability Maximum Cluster size – 4,000 nodes Maximum concurrent tasks – 40,000 Coarse synchronization in JobTracker Availability Failure kills all queued and running jobs Hard partition of resources into map and reduce slots Low resource utilization Lacks support for alternate paradigms and services Iterative applications implemented using MapReduce are 10x slower
4
From MR1 to MR2 -- YARN Apache Hadoop YARN is an attempt to take Apache Hadoop beyond MapReduce for data-processing. YARN was a part of the Hadoop MapReduce project and now is poised to stand up on it’s own as a sub-project of Hadoop.
5
YARN Overview YARN stands for “Yet-Another-Resource-Negotiator”
6
YARN architecture Application Application is a job submitted to the framework Example – Map Reduce Job Container Basic unit of allocation Fine-grained resource allocation across multiple resource types (memory, cpu, disk, network, gpu etc.) container_0 = 2GB, 1CPU container_1 = 1GB, 6 CPU Replaces the fixed map/reduce slots
7
The NodeManager (NM) YARN’s per-node agent -- takes care of the individual compute nodes in a Hadoop cluster. Keeping up-to date with the ResourceManager (RM) Overseeing containers’ life- cycle management; monitoring resource usage (memory, CPU) of individual containers Tracking node-health, log’s management and auxiliary services which may be exploited by different YARN applications.
8
YARN architecture Application Master Instance of a framework-specific library -- responsible for negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the containers and their resource consumption. It has the responsibility of negotiating appropriate resource containers from the ResourceManager, tracking their status and monitoring progress. Resource Manager Pure scheduler -- arbitrating available resources in the system among the competing applications Optimizes for cluster utilization (keep all resources in use all the time) against various constraints such as capacity guarantees, fairness, and SLAs. Has a pluggable scheduler that allows for different algorithms such as capacity and fair scheduling to be used as necessary.
9
YARN Workflow
10
Example Application Scenario -- Storm on YARN https://www.youtube.com/watch?v=L9gyimNNARc https://github.com/yahoo/storm-yarn
11
Example Application Scenario -- HBase on YARN https://github.com/hortonworks/hoya
12
References http://blog.cloudera.com/blog/2012/10/mr2-and-yarn-briefly-explained/ http://hortonworks.com/blog/introducing-apache-hadoop-yarn/
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.