Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pepper: An Elastic Web Server Farm for Cloud based on Hadoop Author : S. Krishnan, J.-S. Counio Date : 2012-05-21 Speaker : Sian-Lin Hong IEEE International.

Similar presentations


Presentation on theme: "Pepper: An Elastic Web Server Farm for Cloud based on Hadoop Author : S. Krishnan, J.-S. Counio Date : 2012-05-21 Speaker : Sian-Lin Hong IEEE International."— Presentation transcript:

1 Pepper: An Elastic Web Server Farm for Cloud based on Hadoop Author : S. Krishnan, J.-S. Counio Date : 2012-05-21 Speaker : Sian-Lin Hong IEEE International Conference on Cloud Computing Technology and Science 1

2 Outline 1. Introduction 2. Design 3. Features 4. Comparison with other systems 5. Conclusion 6. Future work 2

3 1. Introduction (1/3) 1. An emerging trend in many organizations is to build their own cloud infrastructure based on open source technologies 2. Hadoop is one of the more popular frameworks used in cloud infrastructure 3. At Yahoo!, we have been using Hadoop MapReduce in crunching large data sets typically in the order of terabytes per day 3

4 1. Introduction (2/3) 1. We created a distributed workflow management system called PacMan 2. While this platform was quite successful with low frequency large feeds, it was inefficient for high frequency small feeds like news feeds 3. Pepper, a simple platform using Hadoop and ZooKeeper, which can run different web applications on the same Hadoop grid where other regular MapReduce jobs are processed 4

5 1. Introduction (3/3) 1. Pepper requirements: Be able to support all properties:Be able to support all properties: News, Finance, Travel, …News, Finance, Travel, … ScalableScalable Isolation, Multiple Native Libraries versionsIsolation, Multiple Native Libraries versions Low overhead ( < 5 sec )Low overhead ( < 5 sec ) Compatible with Pacman APICompatible with Pacman API Reuse Pacman code/infrastructures as most as possibleReuse Pacman code/infrastructures as most as possible 5

6 2. Design (1/7) 1. Web applications are copied on the Hadoop Distributed File System (HDFS) and a web server is started per instance in the form of a Hadoop job 2. To keep things simple and guarantee isolation, we define that: 1 Hadoop job = 1 Map task = 1 Web server = 1 Web application1 Hadoop job = 1 Map task = 1 Web server = 1 Web application 6

7 2. Design (2/7) 1. Web applications are deployed onto the grid using Job Manager 2. Map Web Engine starts web servers on demand that can come up on any of the Hadoop nodes 3. Web applications register themselves to a central registry maintained in ZooKeeper 7

8 2. Design (3/7) 4. Proxy Router acts as a router, looks up the registry and redirects each request to the appropriate web server 8

9 2. Design (4/7) 1. ZooKeeper A high performance coordination service for distributed applicationsA high performance coordination service for distributed applications It as a central registry; the information of the tasks that run the web applications is maintained hereIt as a central registry; the information of the tasks that run the web applications is maintained here ZooKeeper can scale up to hundreds of thousands of transactions per second on a majority read systemZooKeeper can scale up to hundreds of thousands of transactions per second on a majority read system 9

10 2. Design (5/7) 2. Job Manager This component exposes an API to deploy a web application and copies it onto a dedicated location in HDFSThis component exposes an API to deploy a web application and copies it onto a dedicated location in HDFS Job Manager registers a watcher with ZooKeeper to receive notifications in case any of the servers become unresponsiveJob Manager registers a watcher with ZooKeeper to receive notifications in case any of the servers become unresponsive It also runs a monitor thread that periodically checks that the number of web servers is consistent with the number configured in ZooKeeperIt also runs a monitor thread that periodically checks that the number of web servers is consistent with the number configured in ZooKeeper If the number of servers is more, it kills the job running the web serverIf the number of servers is more, it kills the job running the web server If the number is low, it spawns new jobsIf the number is low, it spawns new jobs 10

11 2. Design (6/7) 3. Map Web Engine This is the implementation of a Hadoop map taskThis is the implementation of a Hadoop map task It as a central registry; the information of the tasks that run the web applications is maintained hereIt as a central registry; the information of the tasks that run the web applications is maintained here Since the web application can run on any machine in the cluster, the generated log files are spread across the gridSince the web application can run on any machine in the cluster, the generated log files are spread across the grid We have developed a custom log4j appender that rolls over the logs onto a common location in HDFSWe have developed a custom log4j appender that rolls over the logs onto a common location in HDFS 11

12 2. Design (7/7) 4. Proxy Router This is the single entry point of the system where clients send their requestsThis is the single entry point of the system where clients send their requests On receiving a request, Proxy Router gets the list of available web servers (hostname:port) that can run the web application from ZooKeeper hostnames nodeOn receiving a request, Proxy Router gets the list of available web servers (hostname:port) that can run the web application from ZooKeeper hostnames node It then forwards the request to one of the available web serversIt then forwards the request to one of the available web servers It has a retry mechanism to handle the race condition if the web server has just gone downIt has a retry mechanism to handle the race condition if the web server has just gone down 12

13 3. Features 1. Scalability 2. Performance 3. High Availability and Self-healing 4. Isolation 5. Ease of Development 6. Reuse of Grid Infrastructure 13

14 4. Comparison with other systems (1/2) 14

15 4. Comparison with other systems (2/2) 1. Qualified with simple workflow and 3 Hadoop slaves cluster 15

16 5. Conclusion (1/2) 1. Use Hadoop and ZooKeeper, both open source 2. Low latency 3. High throughput 4. The cost and complexity is low 16

17 5. Conclusion (2/2) 1. Pepper has been in production since December 2009 2. The simple API and the advantages that it provides have led to its rapid adoption 3. We have been able to scale linearly by simply adding more machines that results in more Hadoop map task slots available to take on the new processing needs 17

18 6. Future work 1. On-demand allocation of servers 2. Async NIO between Proxy Router & Map Web Engine to increase scalability 3. Improving distribution of requests across web servers 4. Follow Hadoop roadmap 18


Download ppt "Pepper: An Elastic Web Server Farm for Cloud based on Hadoop Author : S. Krishnan, J.-S. Counio Date : 2012-05-21 Speaker : Sian-Lin Hong IEEE International."

Similar presentations


Ads by Google