Presentation is loading. Please wait.

Presentation is loading. Please wait.

MW: A framework to support Master Worker Applications Sanjeev R. Kulkarni Computer Sciences Department University of Wisconsin-Madison

Similar presentations


Presentation on theme: "MW: A framework to support Master Worker Applications Sanjeev R. Kulkarni Computer Sciences Department University of Wisconsin-Madison"— Presentation transcript:

1 MW: A framework to support Master Worker Applications Sanjeev R. Kulkarni Computer Sciences Department University of Wisconsin-Madison sanjeevk@cs.wisc.edu

2 www.cs.wisc.edu/condor 2 Outline of the talk › Introduction to MW › Architecture of MW › How to use MW? › Records shattered by MW! › Extensions

3 www.cs.wisc.edu/condor 3 What is MW? › MW = Master Worker › Object Oriented, Fault Tolerant framework for Master-Worker Applications › Can run on a variety of resource managers like Condor, PVM, Globus,...

4 www.cs.wisc.edu/condor 4 MW › Object Oriented  MW is a set of C++ base classes  Users write a few virtual functions › Fault Tolerant  Handles workers joining/leaving  Handles suspended/resumed workers  Checkpointing

5 www.cs.wisc.edu/condor 5 MW Framework Application MWLayer Resource Management and Communication Layer E.g. Condor, PVM, Globus,...

6 www.cs.wisc.edu/condor 6 Why use MW? › Handles communication layer › Handles resource Management  Useful especially in an opportunistic environment like Condor › Same application code can run on various resource managers

7 www.cs.wisc.edu/condor 7 MW Layer › Three main entities  MWDriver corresponds to the (Tyrant) Master  MWWorker corresponds to the (exploited) Worker  MWTask The work itself!

8 www.cs.wisc.edu/condor 8 MWDriver: The Master › Setup › Manages workers joining/leaving › Deals with HostSuspend/HostDelete › Maintains lists of tasks  ToDo, Done, Running › Acts on completed tasks › Checkpointing

9 www.cs.wisc.edu/condor 9 MWWorker: The Worker › Executes the given task  Unpacks the task got from Master  Executes the task  Packs results and sends back to Master

10 www.cs.wisc.edu/condor 10 MWTask: The Work › Definition of a Unit of work › Holds work to be done and results › Follows the path Master-Worker- Master

11 www.cs.wisc.edu/condor 11 Master Workers Running To Do T1 T2 T3... Global Data Done

12 www.cs.wisc.edu/condor 12 Master Workers Running To Do T2 T3 T4... Global Data Wid 1 W1 T1 Done

13 www.cs.wisc.edu/condor 13 Master Workers Running To Do T5 T6 T7... Global Data Wid 1 W1 T1 T2 T3 T4 Wid 2Wid 3 Wid 4 W2W3 W4 Done

14 www.cs.wisc.edu/condor 14 Master Workers Running To Do T6 T7 T8... Global Data Wid 1 W1 T1 T2 T5 T4 Wid 2Wid 3 Wid 4 W2W3 W4 T3 Done

15 www.cs.wisc.edu/condor 15 Master Workers Running To Do T2 T6 T7... Global Data Wid 1 W1 T1 T5 T4 Wid 2Wid 3 Wid 4 W2W3 W4 T3 Done

16 www.cs.wisc.edu/condor 16 Intelligent Scheduling of Tasks › Some tasks may be harder › Some machines may be faster › Task ordering based on user defined MWKey › Machine ordering based on Resource Manager information  e.g. Condor ClassAds

17 www.cs.wisc.edu/condor 17 Checkpointing › Save master state in case of failure › Automatic restart from checkpoint file in case of master failure › User controlled checkpoint frequency › Users need to implement two additional functions to take advantage of these.

18 www.cs.wisc.edu/condor 18 How can you use MW? › MWDriver’s virtual functions  get_user_info () - Processes arguments and does basic setup  setup_initial_tasks () - Fills in the ToDo list  pack_worker_init_data () - packs init data for worker  act_on_completed_tasks () - called every time a task completes. Optional.

19 www.cs.wisc.edu/condor 19 Using MW (cont.) › MWWorker  unpack_init_data () - unpack the init data sent by the master  execute_task () - execute a task and fill in the results › MWTask  pack_work (), unpack_work ()  pack_results (), unpack_results ()

20 www.cs.wisc.edu/condor 20 Resource Management and Communication Layer › Presently two implementations exist  MW-PVM Communication :PVM Resource Manager : Condor-PVM  MW-File Communication : Files Resource Manager : Condor

21 www.cs.wisc.edu/condor 21 MW-Independent › Single process version  master  single worker › sends and receives become memcpy › No changes in source! › Why?  Faster debugging

22 www.cs.wisc.edu/condor 22 Data Management › Exploitation of available memory as compared to CPU cycles › Aims to reduce access to storage site by caching data on network › Aimed at applications involving extremely large data sets

23 www.cs.wisc.edu/condor 23 Data Manager › Partitions large data sets into chunks › Workers work on chunks allocated by manager Data Manager W1W2W3 Storage DeviceMemory

24 www.cs.wisc.edu/condor 24 World Record Spree!! › Stochastic Linear Programming  “Record breaking” problem of over 8.5M rows and 35M columns solved › Quadratic Assignment Problem  nug27 in a little more than two days! › Jeff will talk more in his talk

25 www.cs.wisc.edu/condor 25 Extensions › More Resource Manager ports  Globus  A thread based version for SMPs 

26 www.cs.wisc.edu/condor 26 Scalability Issues › Scales linearly with respect to the number of workers › OK for 200 workers but for 1000? › Solution  Hierarchical MW?

27 www.cs.wisc.edu/condor 27 More about MW › http://www.cs.wisc.edu/condor/mw › Demos of MW in action using Condor pool tomorrow in Room 3397


Download ppt "MW: A framework to support Master Worker Applications Sanjeev R. Kulkarni Computer Sciences Department University of Wisconsin-Madison"

Similar presentations


Ads by Google