Presentation is loading. Please wait.

Presentation is loading. Please wait.

Part III BigData Analysis Tools (YARN) Yuan Xue

Similar presentations


Presentation on theme: "Part III BigData Analysis Tools (YARN) Yuan Xue"— Presentation transcript:

1 Part III BigData Analysis Tools (YARN) Yuan Xue (yuan.xue@vanderbilt.edu) http://www.slideshare.net/hortonworks/apache-hadoop-yarn-enabling-nex

2 Motivation  Review of MapReduce (MR1)

3 Motivation -Limitation of MR1  Scalability  Maximum Cluster size – 4,000 nodes  Maximum concurrent tasks – 40,000  Coarse synchronization in JobTracker  Availability  Failure kills all queued and running jobs  Hard partition of resources into map and reduce slots  Low resource utilization  Lacks support for alternate paradigms and services  Iterative applications implemented using MapReduce are 10x slower

4 From MR1 to MR2 -- YARN Apache Hadoop YARN is an attempt to take Apache Hadoop beyond MapReduce for data-processing. YARN was a part of the Hadoop MapReduce project and now is poised to stand up on it’s own as a sub-project of Hadoop.

5 YARN Overview  YARN stands for “Yet-Another-Resource-Negotiator”

6 YARN architecture  Application  Application is a job submitted to the framework  Example – Map Reduce Job  Container  Basic unit of allocation  Fine-grained resource allocation across multiple resource types (memory, cpu, disk, network, gpu etc.)  container_0 = 2GB, 1CPU  container_1 = 1GB, 6 CPU  Replaces the fixed map/reduce slots

7  The NodeManager (NM)  YARN’s per-node agent -- takes care of the individual compute nodes in a Hadoop cluster.  Keeping up-to date with the ResourceManager (RM)  Overseeing containers’ life- cycle management; monitoring resource usage (memory, CPU) of individual containers  Tracking node-health, log’s management and auxiliary services which may be exploited by different YARN applications.

8 YARN architecture  Application Master  Instance of a framework-specific library -- responsible for negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the containers and their resource consumption.  It has the responsibility of negotiating appropriate resource containers from the ResourceManager, tracking their status and monitoring progress.  Resource Manager  Pure scheduler -- arbitrating available resources in the system among the competing applications  Optimizes for cluster utilization (keep all resources in use all the time) against various constraints such as capacity guarantees, fairness, and SLAs.  Has a pluggable scheduler that allows for different algorithms such as capacity and fair scheduling to be used as necessary.

9 YARN Workflow

10 Example Application Scenario -- Storm on YARN https://www.youtube.com/watch?v=L9gyimNNARc https://github.com/yahoo/storm-yarn

11 Example Application Scenario -- HBase on YARN https://github.com/hortonworks/hoya

12 References  http://blog.cloudera.com/blog/2012/10/mr2-and-yarn-briefly-explained/  http://hortonworks.com/blog/introducing-apache-hadoop-yarn/


Download ppt "Part III BigData Analysis Tools (YARN) Yuan Xue"

Similar presentations


Ads by Google