Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Globus to Scale an Application Case Study 4: Scientific Workflow for Computational Economics Tiberiu Stef-Praun, Gabriel Madeira, Ian Foster, Robert.

Similar presentations


Presentation on theme: "Using Globus to Scale an Application Case Study 4: Scientific Workflow for Computational Economics Tiberiu Stef-Praun, Gabriel Madeira, Ian Foster, Robert."— Presentation transcript:

1 Using Globus to Scale an Application Case Study 4: Scientific Workflow for Computational Economics Tiberiu Stef-Praun, Gabriel Madeira, Ian Foster, Robert Townsend

2 OSGCC 2008Globus Primer: An Introduction to Globus Software2 The Challenge l Expand capability of economists to develop and validate models of social interactions at large scales u Harness large computation systems u Simplify programming model (eye toward easy integration of science code) u Improve automation l Requires an end-to-end approach, but through integration, not the “silo” model

3 OSGCC 2008Globus Primer: An Introduction to Globus Software3 Moral Hazard Problem l An entity in control of some resources (the entrepreneur) contracts with other entities that use these resources to produce outputs (the workers) l Two organizational forms are available u The workers cooperate on their efforts and divide up their income (thus sharing risks) u The workers are independent of each other, and are rewarded based on relative performance l Both are stylized versions of what is observed in tenancy data in villages such as in Maharastra, India (Townsend and Mueller 1998)

4 OSGCC 2008Globus Primer: An Introduction to Globus Software4 Moral Hazard Solver l Five stages, each solved by linear programming u Balance between promises for future and consumption to optimally reward agents l In each stage: Given a set of parameters: consumption, effort, technology, output, wealth u Do a linear optimization to find out the best behavior u Parameter sweep (grid of parameter values) u Linear solver is run independently on each point of the parameter grid u Results are merged at end of the stage l Across stages: Different organization (parameters) for similar stage structure u Most stages depend on results of other stages

5 OSGCC 2008Globus Primer: An Introduction to Globus Software5 Stage One 26 x StageOne.${i}.out *.mat input data files Stage Five MergedStageOne.out MergedStageTwo.out MergedStageThree.out MergedStageFour.out MergedStageFive.out Stage Two 52 x StageTwo.${i}.out Stage Four 40 x StageOne.${i}.out Stage Three 40 x StageThree.${i}.out Remote Execution Local Execution Legend 50 Min 30 Min 3 Min 40 Min 2 Min

6 OSGCC 2008Globus Primer: An Introduction to Globus Software6 Issues - Technical l Language u Science code written in MATLAB/Octave u End to end system must be language-independent l Code prerequisites u Each solver task requires MATLAB/Octave pre- installed on the execution node, and solver code staged in prior to execution u Each solver task requires files from previous stages l Automation u ~200 tasks must be executed u This is a lot of “babysitting” if performed manually

7 OSGCC 2008Globus Primer: An Introduction to Globus Software7 Issues - Social l Licensing u MATLAB licensing has a per-node cost u Expensive if you’re using O(10)+ nodes l Provenance u Task execution, data integrity u Not a huge concern at this scale, but for larger scales (10,000 tasks) it is important to record how the work is performed l Provisioning, resource sharing u This problem used a shared campus cluster (at U Chicago) u We know of problems with 2-3 orders of magnitude more tasks, which require (inter)national-scale resources to accomplish in a timely fashion

8 OSGCC 2008Globus Primer: An Introduction to Globus Software8 Swift System l Swift is a Grid-enabled application framework u Emphasis on workflow and adapting legacy application to a Grid environment l Technical features u Clean separation of logical/physical concerns l XDTM specification of logical data structures + Concise specification of parallel programs l SwiftScript, with iteration, etc. + Efficient execution on distributed resources l Karajan threading, Falkon provisioning, Globus interfaces, pipelining, load balancing + Rigorous provenance tracking and query l Virtual data schema & automated recording  Improved usability and productivity l Demonstrated in numerous applications

9 Virtual Node(s) SwiftScript Abstract computation Virtual Data Catalog SwiftScript Compiler SpecificationExecution Virtual Node(s) Provenance data Provenance data Provenance collector launcher file1 file2 file3 App F1 App F2 Scheduling Execution Engine (Karajan w/ Swift Runtime) Swift runtime callouts C CCC Status reporting Provisioning Falkon Resource Provisioner Amazon EC2 Dynamic Provisioning: Swift Architecture Yong Zhao, Mihael Hatigan, Ioan Raicu, Mike Wilde, Ben Clifford OSGCC 20089Globus Primer: An Introduction to Globus Software

10 OSGCC 2008Globus Primer: An Introduction to Globus Software10 Workflow Language - SwiftScript l Goal: Natural feel to expressing distributed applications u Variables (basic, data structures) u Conditional operators (if, foreach, ) u Functions (atomic / compound) l Used to connect outputs to inputs l It does not specify invocation order, only dependencies l It can be seen as a metadata for expressing experiments

11 OSGCC 2008Globus Primer: An Introduction to Globus Software11 Execution Engine l Karajan engine (event-based execution) l Has a scheduler to map tasks to resources u Score-based planning u Recovers from failures (retries) l Falkon resource manager creates a “virtual private cluster” u Uses Globus GRAM4 (PBS/Condor/Fork) to acquire resources from Grid systems

12 OSGCC 2008Globus Primer: An Introduction to Globus Software12 The Solution l Code changes u Solver code was broken into modules (atomic blocks) to allow parallel execution u Code ported from MATLAB to Octave to avoid per- node licensing fees u Workflow was described in SwiftScript l Software installation u Swift engine, Karajan, Falkon deployed locally l Shared resource (already available) u Existing compute cluster with GRAM4, GridFTP, etc.

13 OSGCC 2008Globus Primer: An Introduction to Globus Software13 Moral Hazard SwiftScript Code Excerpts // A second atomic procedure: merge (file mergeSolutions[]) econMerge (file merging[]) { app{ econMerge @filenames(mergeSolutions) @filenames(merging); } } // We define the stage one procedure–a compound procedure (file solutions[]) stageOne (file inputData[], file prevResults[]) { file script ; int batch_size = 26; int batch_range = [0:25]; string inputName = "IRRELEVANT"; string outputName = "stageOneSolverOutput"; // The foreach statement specifies that the calls can be performed concurrently foreach i in batch_range { int position = i*batch_size; solutions[i] = moralhazard_solver(script,batch_size,position, inputName, outputName, inputData, prevResults); } } // These get used in the “main program” as follows stageOneSolutions = StageOne(stageOneInputFiles,stageOnePrevFiles); stageOneOutputs = econMerge(stageOneSolutions);

14 OSGCC 2008Globus Primer: An Introduction to Globus Software14 Execution on 40 Processors

15 OSGCC 2008Globus Primer: An Introduction to Globus Software15 Results - Moral Hazard Solver l Performance u Original run time: ~2 hrs u Swift run time: ~28 min u Depending on the stage structure, speedup up to 10x, or slowdown (because of overhead) u Only used one grid site (UC), on multiple sites could get better performance l Execution has been automated u Human labor greatly reduced u Separation of human concerns (science code, system operation, task management) u Easy to repeat, modify & rerun, etc.

16 OSGCC 2008Globus Primer: An Introduction to Globus Software16 Other Applications Application#Jobs/computationLevels ATLAS* HEP Event Simulation 500K1 fMRI DBIC* AIRSN Image Processing 100s12 FOAM Ocean/Atmosphere Model 2000 (core app runs 250 8-CPU jobs) 3 GADU* Genomics: (14 million seq. analyzed) 40K4 HNL fMRI Aphasia Study 5004 NVO/NASA* Photorealistic Montage/Morphology 1000s16 QuarkNet/I2U2* Physics Science Education 10s3-6 RadCAD* Radiology Classifier Training 1000s5 SIDGrid EEG Wavelet Proc, Gaze Analysis, … 100s20 SDSS* Coadd, Cluster Search 40K, 500K2, 8

17 Globus has… l Modular architecture l Well-defined APIs l Embeddable libraries l Web service interfaces l Globus-enabled frameworks for MPI, RPC, parallel jobs, etc. l A very experienced support team l Globus support on national infrastructure Globus doesn’t have… l Your application already Grid-enabled l A tool to automatically adapt your code l Domain-specific frameworks OSGCC 2008Globus Primer: An Introduction to Globus Software17

18 Other Grid-enabling Paths l MPIg can run MPI applications on Grid infrastructure with little or no code change u Performance optimization is another story… l Condor-G can submit tasks to GRAM2, GRAM4, Condor, etc. l MyCluster can construct a virtual cluster out of several GRAM-accessible resources l NinfG can run RPC applications on Grid infrastructure without even recompiling l Introduce and gRAVI can build a Web service interface for your code and get it running on a GRAM-accessible resource so that others can invoke your code via WS OSGCC 2008Globus Primer: An Introduction to Globus Software18


Download ppt "Using Globus to Scale an Application Case Study 4: Scientific Workflow for Computational Economics Tiberiu Stef-Praun, Gabriel Madeira, Ian Foster, Robert."

Similar presentations


Ads by Google