Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ashish Patro MinJae Hwang Thanumalayan S. Thawan Kooburat.

Similar presentations


Presentation on theme: "Ashish Patro MinJae Hwang Thanumalayan S. Thawan Kooburat."— Presentation transcript:

1 Ashish Patro MinJae Hwang Thanumalayan S. Thawan Kooburat

2 Agenda Overview Job Scheduler Characteristics Scheduling Prototypes Performance Data 2

3 Introduction Key Idea Centralized job scheduler in the Cloud Benefit and Motivation Simplify deployment and maintenance Deploy only worker daemon Scalability of the Cloud Infinite scalability and pay-as-you go model Simplify system design Reliable services 3

4 Overview 4

5 Platform Choice Amazon – EC2 On-demand VMs Google App Engine Web application hosting platform Reliable and scalable storage: DataStore Automatic load balancing Other services: Memcached, Instant Messaging, Email, Cron Jobs, … 5

6 Job Scheduler Characteristics Condor Job Scheduling (revisit) Central Manager Schedd Job Queue Collector ClassAd Storage Negotiator Match Maker Startd Worker Node C C C C C 6

7 Servlet Process Job Scheduler Characteristics Job Scheduler need to process large amount of data Job Request Memcache Datastore Daemon Google App Engine 7

8 Memory Hierarchy Local memory (Static variable) Local memory (Static variable) Memcached DataStore VolatileQuerySerialization Cost Global Namespace 8

9 Design Choices 3-tier Design Application logic Scheduling intervals: Online, Batch Scheduling logics : DB Query or ClassAd Data Machine ClassAd Machine states Jobs 9

10 Scheduling Prototypes Local memory (Static variable) Local memory (Static variable) Memcached DataStore Batch ClassAd Online ClassAd C C C C C C C C J C C J C C C C C J Online + Batch DB Query Prototyev1v2v3 Matching IntervalBatchOnlineOnline+Batch LogicClassAd DB Query Data retrieval time3.0 s0.76 s--- Requests /second3.203.7019.0 M M M M M M M M M 10

11 DataStore DataStore API take lots of CPU cycle Easy to reach hard DataStore limit (20 CPU-sec/sec) Storing each ClassAd take 0.22 CPU-sec DataStore rejects requests on high contention cells DataStore is faster to retrieve a large amount of data Query predicates have to match pre-defined indices 11

12 Memcache Memcached size limit 10K entries of 5K ClassAd Memcached latency Retrieving 1.5K entries of 5K ClassAd takes 30 secs. Memcached parallelism Multiple concurrent requests to memcached do not degrade performance Only provide get/set interface Cannot traverse/query memcached 12

13 Hosting Platform GAE dynamically spawns JVM processes Spawns only when all process is busy Each JVM has only 1 thread Maximum 10 JVM processes for free account 10 request at a time Memory limit per each JVM is around 110MB JVM process is short-lived Get killed after 110 seconds idle Use Cron job to keep JVM process alive 13

14 Other Useful Services Instant messaging: XMPP Convenient communication protocol Between CM and worker Between CM and users (Google Talk) Email Offline notification 14

15 Conclusion 15

16 Testimonial Google App Engine is crippled J2EE+RDBMS Scale with money Good testing platform Look promising on documents 16

17 Servlet Process Job Request MemcacheDatastore Daemon Google App Engine

18 18 Central Manager Schedd Job Queue Collector ClassAd Storage Negotiator Match Maker Startd Worker Node C C C C C


Download ppt "Ashish Patro MinJae Hwang Thanumalayan S. Thawan Kooburat."

Similar presentations


Ads by Google