Presentation is loading. Please wait.

Presentation is loading. Please wait.

Swift: A language for scalable scientific computing Justin M. Wozniak Swift project: Presenter contact:

Similar presentations


Presentation on theme: "Swift: A language for scalable scientific computing Justin M. Wozniak Swift project: Presenter contact:"— Presentation transcript:

1 Swift: A language for scalable scientific computing Justin M. Wozniak Swift project: www.ci.uchicago.edu/swift Presenter contact: wozniak@mcs.anl.gov

2  Parallel scripting language for clusters, clouds & grids –For writing loosely-coupled scripts of application programs and utilities linked by exchanging files –Can call scripts in shell, python, R, Octave, MATLAB, NCL, …  Swift does 3 important things for you: –Makes parallelism transparent – with functional dataflow –Makes computing location transparent – can run your script on multiple distributed sites, diverse computing resources (from desktop to petascale) –Makes basic failure recovery transparent 2 www.ci.uchicago.edu/swift

3 Numerous many-task applications  Simulation of super- cooled glass materials  Protein folding using homology-free approaches  Climate model analysis and decision making in energy policy  Simulation of RNA-protein interaction  Multiscale subsurface flow modeling  Modeling of power grid for OE applications  A-E have published science results obtained using Swift T0623, 25 res., 8.2Å to 6.3Å (excluding tail) Protein loop modeling. Courtesy A. Adhikari Native Predicted Initial E E D D C C A A B B A A B B C C D D E E F F F F > > 3 www.ci.uchicago.edu/swift www.mcs.anl.gov/exm

4 Example of when you need Swift: Process and analyze MODIS satellite land use datasets Swift preprocessing & analysis script  Input: hundreds of land cover tile files (forest, ice, urban, etc) & ROI  Output: regions with greatest concentration of specific land types  Apps: image processing, statistics, output rendering 4 Largest zones of forest land-cover In analyzed region Largest zones of forest land-cover In analyzed region 300+ MODIS Files www.ci.uchicago.edu/swift

5 Dataflow and applications of MODIS pipeline analyzeLandUse colorMODIS assemble markMap landUse x 317 landUse x 317 analyzeLandUse colorMODIS x 317 colorMODIS x 317 getLandUse x 317 getLandUse x 317 assemble markMap MODIS script is automatically run in parallel: Each loop level can process tens to thousands of image files. 5 www.ci.uchicago.edu/swift

6 Encapsulation of applications enables automatic distributed parallel execution Swift app( ) function Interface definition Swift app( ) function Interface definition Application program Typed Swift data object Files expected or produced by application program Declaring applications as Swift functions enables the Swift runtime system to make parallel execution implicit and data distribution and fault retry automatic. 6 www.ci.uchicago.edu/swift

7 app( ) functions specify argument and data passing Swift getLandUse() app function Swift getLandUse() app function img sortby oo getlanduse application -i -s > > To run an application program: getlanduse infile sortfield > outfile First define this app function: app (file o) getLandUse (image img, int sortby) { getlanduse "-i" @img "-s" sortby stdout=@o ; } Then invoke the app function: file mfile ; file use ; use = landUse(mfile, "landtype"); 1 (=sort by usage) 7 H22v19.tif H22v19.use www.ci.uchicago.edu/swift

8 MODIS script in Swift: main data flow foreach g,i in geos { land[i] = getLandUse(g,1); } (topSelected, selectedTiles) = analyzeLandUse(land, landType, nSelect); foreach g, i in geos { colorImage[i] = colorMODIS(g); } gridMap = markMap(topSelected); montage = assemble(selectedTiles,colorImage,webDir); 8 www.ci.uchicago.edu/swift

9 Programming model: all execution driven by parallel data flow  f() and g() are computed in parallel  myproc() returns r when they are done  This parallelism is automatic  Works recursively throughout the program’s call graph 9 (int r) myproc (int i) { j = f(i); k = g(i); r = j + k; } www.ci.uchicago.edu/swift

10 Submit host (login node, laptop, Linux server) Data server Swift script Swift runtime system has drivers and algorithms to efficiently support and aggregate vastly diverse runtime environments Swift runs applications on distributed parallel systems Clouds: Amazon EC2, XSEDE Wispy, Future Grid … Clouds: Amazon EC2, XSEDE Wispy, Future Grid … Application Programs 10 www.ci.uchicago.edu/swift

11  Swift is a parallel scripting system for grids, clouds and clusters –for loosely-coupled applications - application and utility programs linked by exchanging files  Swift is easy to write: simple high-level C-like functional language –Small Swift scripts can do large-scale work  Swift is easy to run: contains all services for running Grid workflow - in one Java application –Untar and run – acts as a self-contained Grid client  Swift is fast: uses efficient, scalable and flexible distributed execution engine. –Scaling close to 1M tasks –.5M in live science work, and growing  Swift usage is growing: –applications in earth systems sciences, neuroscience, proteomics, molecular dynamics, biochemistry, economics, statistics, and more. 11 www.ci.uchicago.edu/swift

12 Use: Parallel analysis & visualization of high-res climate models AMWG Diagnostic Package Results: Rewriting the AMWG Diagnostic Package in Swift created a 3X speedup. (a)The AMWG Diagnostic Package was used to calculate the climatological mean files for 5 years of 0.10 degree up-sampled data from a 0.25 degree CAM-SE cubed sphere simulation. This was ran on fusion, a cluster at Argonne, on 4 nodes using one core on each. (b)The AMWG Diagnostic Package was used to compare two data sets. The data consisted of two sets of 30 year, 1 degree monthly average CAM files. This was ran on one data analysis cluster node on mirage at NCAR. (c)The AMWG Package was used to compare 10 years of 0.5 degree resolution CAM monthly output files to observational data. This comparison was also ran on one node on mirage.  Work of S Mickelson, RL Jacob, K Schuchardt and the Pagoda Team, M Woitasek, J Dennis.  Standard CESM diagnostic scripts for atmosphere, ocean, and ice models have been converted to Swift  Using NCO and NCL as primary app tools  Transcription of csh scripts yields automated parallel execution.  Swift versions are migrating into the standard UCAR-distributed versions and are in increasingly general community use  Pagoda introduced to replace NCO processing with a tightly-coupled parallel algorithm for hybrid parallelism www.ci.uchicago.edu/swift

13 CESM Diagnostic Package Speedups 13 Fig 1: Atmosphere (AMWG) Diagnostics SwiftOriginal Fig 2: AMWG Diagnostics with Pagoda Fig 3: Ice Diagnostics Fig 4: Ocean (OMWG) Diagnostics www.ci.uchicago.edu/swift

14 Use: Impact of cloud processes on precipitation  Work L Zamboni, RL Jacob, RV Kotamarthi, TJ Williams, IM Held, M Zhao, C Kerr, V Balaji, JD Neelin, JC. McWilliams  Swift used to postprocess over 80 1TB high- res GFDL HiRAM model runs  Re-organization of data: combining model segments, annualization, and summarization/statistics.  Initial runs achieved 50 TB model data processing in less than 3 days on 32 8-core nodes of ALCF Eureka.  Encapsulated existing scripts created by GFDL for HiRAM  Only a few lines of Swift were needed to parallelize HiRAM postproc scripts  Scripts adjusted to optimize IO 14 www.ci.uchicago.edu/swift

15 Use: land use and energy use models for CIM-EARTH and RDCEP projects Below: CIM-EARTH energy-economics parameter sweeps of 5,000 AMPL models exploring uncertainty in consumer (top) and industrial (bottom) electricity usage projects by region for the next 5 decades Left: 120,000 DSSAT models of corn yield are executed in parallel using Swift, per campaign. CENW is also supported as the framework is generalized. Work of J. Elliott, M. Glotter, D. Kelly, K. Maheshwari www.ci.uchicago.edu/swift

16 USE: Hydrology assessment of Upper Mississippi River Basin Model Construction Model Calibration Hydrologic Cycle Nutrient Cycle Crop Growth Routing Hydrology: Runoff Evapotranspiration Groundwater Soil moisture Water Quality: Nutrients Erosion Pesticides Crops: Biomass Yield Model Projection  Work of E. Yan, Y. Demisie, J. Margoliash, and other collaborators at Argonne  Calibration Challenge –Four processes and 120 parameters –Process is not independent –Long model run time (8 hours) www.ci.uchicago.edu/swift

17 When do you need Swift? Typical application: protein-ligand docking for drug screening Work of M. Kubal, T.A.Binkowski, And B. Roux (B) O(100K) drug candidates Tens of fruitful candidates for wetlab & APS O(10) proteins implicated in a disease 1M compute jobs X … 17 www.ci.uchicago.edu/swift www.mcs.anl.gov/exm

18 app( ) functions specify cmd line argument passing Swift app function “predict()” Swift app function “predict()” tt seq dt log PSim application -t -d -s -c > > pdb pg To run: psim -s 1ubq.fas -pdb p -t 100.0 -d 25.0 >log In Swift code: app (PDB pg, File log) predict (Protein seq, Float t, Float dt) { psim "-c" "-s" @pseq.fasta "-pdb" @pg "–t" temp "-d" dt; } Protein p ; PDB structure; File log; (structure, log) = predict(p, 100., 25.); Fasta file Fasta file 100.0 25.0 18 www.ci.uchicago.edu/swift www.mcs.anl.gov/exm

19 foreach sim in [1:1000] { (structure[sim], log[sim]) = predict(p, 100., 25.); } result = analyze(structure) … 1000 Runs of the “predict” application Analyze() T1af 7 T1r6 9 T1b72 Large scale parallelization with simple loops 19 www.ci.uchicago.edu/swift www.mcs.anl.gov/exm

20 Nested parallel prediction loops in Swift 20 1.Sweep( ) 2.{ 3. int nSim = 1000; 4. int maxRounds = 3; 5. Protein pSet[ ] ; 6. float startTemp[ ] = [ 100.0, 200.0 ]; 7. float delT[ ] = [ 1.0, 1.5, 2.0, 5.0, 10.0 ]; 8. foreach p, pn in pSet { 9. foreach t in startTemp { 10. foreach d in delT { 11. ItFix(p, nSim, maxRounds, t, d); 12. } 13. } 14. } 15.} 16.Sweep(); 10 proteins x 1000 simulations x 3 rounds x 2 temps x 5 deltas = 300K tasks www.ci.uchicago.edu/swift www.mcs.anl.gov/exm

21 ModFTDock 12/6/2011 Coasters: uniform resource provisioning 21 string str_roots[] = readData( @arg( "list" ) ); int n = @toint(@arg("n","1")); foreach str_root in str_roots { string str_file_static = @strcat( @arg("in", "input/"), str_root, ".pdb"); string str_file_mobile = "input/4TRA.pdb"; file_pdb file_static ; file_pdb file_mobile ; file_dat dat_files[]<simple_mapper; padding = 3, location=@arg("out", "output"), prefix=@strcat(str_root, "_"), suffix=".dat">; foreach mod_index in [0:n-1] { string str_modulo = @strcat(mod_index, ":", modulus); dat_files[mod_index] = do_one_dock(str_root, str_modulo, file_static, file_mobile); } 2dk1.pdb

22 Spatial normalization of functional run Dataset-level workflow Expanded (10 volume) workflow 22 www.ci.uchicago.edu/swift www.mcs.anl.gov/exm

23 Complex scripts can be well-structured programming in the large: fMRI spatial normalization script example (Run snr) functional ( Run r, NormAnat a, Air shrink ) { Run yroRun = reorientRun( r, "y" ); Run roRun = reorientRun( yroRun, "x" ); Volume std = roRun[0]; Run rndr = random_select( roRun, 0.1 ); AirVector rndAirVec = align_linearRun( rndr, std, 12, 1000, 1000, "81 3 3" ); Run reslicedRndr = resliceRun( rndr, rndAirVec, "o", "k" ); Volume meanRand = softmean( reslicedRndr, "y", "null" ); Air mnQAAir = alignlinear( a.nHires, meanRand, 6, 1000, 4, "81 3 3" ); Warp boldNormWarp = combinewarp( shrink, a.aWarp, mnQAAir ); Run nr = reslice_warp_run( boldNormWarp, roRun ); Volume meanAll = strictmean( nr, "y", "null" ) Volume boldMask = binarize( meanAll, "y" ); snr = gsmoothRun( nr, boldMask, "6 6 6" ); } (Run or) reorientRun ( Run ir, string direction) { foreach Volume iv, i in ir.v { or.v[i] = reorient(iv, direction); } 23 www.ci.uchicago.edu/swift www.mcs.anl.gov/exm

24 Dataset mapping example: fMRI datasets type Study { Group g[ ]; } type Group { Subject s[ ]; } type Subject { Volume anat; Run run[ ]; } type Run { Volume v[ ]; } type Volume { Image img; Header hdr; } On-Disk Data Layout Swift’s in-memory data model Mapping function or script Mapping function or script 24 www.ci.uchicago.edu/swift www.mcs.anl.gov/exm

25 Simple campus Swift usage: running locally on a cluster login host Swift can send work to multiple sites Data server (GridFTP, scp, …) Any remote host Condor Or GRAM Provider Remote Data Provider Remote campus cluster Local Scheduler GRAM Remote campus cluster Local Scheduler GRAM... 25 www.ci.uchicago.edu/swift www.mcs.anl.gov/exm

26 Coasters: Flexible worker-node agents for execution and data transport  Main role is to efficiently run Swift tasks on allocated compute nodes, local and remote  Handles short tasks efficiently  Runs on: –Clusters: PBS, SGE, SLURM, SSH, etc. –Grid infrastructure: Condor, GRAM –Cloud infrastructure: Amazon, University clouds, etc. –HPC infrastructure: Blue Gene, Cray  Runs with few infrastructure dependencies  Can optionally perform data staging  Provisioning can be automatic or external (manually launched or externally scripted) 26 www.ci.uchicago.edu/swift www.mcs.anl.gov/exm

27 Centralized campus cluster environment Any remote host Swift script Swift can deploy coasters automatically, or user can deploy manually or with a script. Swift provides scripts for clouds, ad-hoc clusters, etc. Coasters deployment can be automatic or manual Data server Coaster Auto Boot Data Provider Coaster Service Compute nodes f1 a1 Remote Host GRAM or SSH Coaster worker Coaster worker... Cluster Scheduler 27 www.ci.uchicago.edu/swift www.mcs.anl.gov/exm

28 Coaster architecture handles diverse environments 28 www.ci.uchicago.edu/swift www.mcs.anl.gov/exm

29 Motivation: Queuing systems  What we have (PBS, SGE)  What we would like 12/6/2011 Coasters: uniform resource provisioning 29

30 Big Picture: Reusable service components Application invocations Applications Data Dependencies Resources CPUs Storage Services Cluster HPC Grid Cloud Interfaces Coaster Pilot Job Coaster Service 12/6/2011 30 Coasters: uniform resource provisioning

31 Big Picture: No hands operation 12/6/2011 Coasters: uniform resource provisioning 31 Client Coaster Client Coaster Service Code Head Node GRAM/SSH Bootstrap Cache Coaster Service

32 Big Picture: Manage scientific applications Application invocations Produce inputs Resources CPUs Storage Services Cluster HPC Grid Cloud Interfaces Coaster Pilot Job Coaster Service Produce inputs Compute Produce inputs Analyze Reduce 12/6/2011 32 Coasters: uniform resource provisioning

33 Disk Motivation: File system use on Grids  Typical file path when using GRAM  For a stage-in, a file is read 3 times from disk, written 2 times to disk, and goes 4 times through some network  Assumption: it’s more efficient to copy data to compute node local storage than running a job directly on a shared FS. 12/6/2011 Coasters: uniform resource provisioning 33 Stage-in Stage-out FS Server Submit Site Head Node FS Server Compute Node Disk

34 Motivation: … can be made more efficient  2 disk reads, one write, and 2 times through network  Assumption: compute node has no outside world access, otherwise the head node can be bypassed 12/6/2011 Coasters: uniform resource provisioning 34 Disk Submit Site Head Node Compute Node Stage-in Stage-out

35 Big Picture: Make these resources uniform 12/6/2011 Coasters: uniform resource provisioning 35 Application-level Task execution Data access and transfer Infrastructure-level Pilot job management Configuration management Enable automation of configuration Enable high portability

36 Big Picture: Create a live pathway between client side and compute nodes 12/6/2011 Coasters: uniform resource provisioning 36 Client Applications Invocations Data Desktop Coaster Service Cluster Grid Site Cloud Coaster Service Coaster Client Coaster Worker CPU Storage Compute Node Coaster Worker CPU Storage Compute Node Coaster Worker CPU Storage Compute Node Coaster Worker CPU Storage

37 Execution infrastructure - Coasters  Coasters: a high task rate execution provider –Automatically deploys worker agents to resources with respect to user task queues and available resources 12/6/2011 37 Coasters: uniform resource provisioning –Implements the Java CoG provider interfaces for compatibility with Swift and other software –Currently runs on clusters, grids, and HPC systems –Can move data along with task submission –Contains a “block” abstraction to manage allocations containing large numbers of CPUs

38 Infrastructure: Execution/data abstractions  We need to submit jobs and move data to/from various resources  Coasters is implemented to use the Java CoG Kit providers  Coasters implements the Java CoG abstractions  Java CoG supports: Local execution, PBS, SGE, SSH, Condor, GT, Cobalt  Thus, it runs on: Cray, Blue Gene, OSG, TeraGrid, clusters, Bionimbus, FutureGrid, EC2, etc.  Coasters can automatically allocate blocks of computation time for use in response to user load: “block queue processing”  Runs as a service or bundled with Swift execution 12/6/2011 Coasters: uniform resource provisioning 38

39 Implementation: The job packing problem (I)  We need to pack small jobs into larger jobs (blocks)  Resources have restrictions: –Limit on total number of LRM jobs –Limit on job times –Non-unity granularity (e.g. CPUs can only be allocated in multiples of 16)  We don’t generally know where empty spots in the scheduling are  We want to fit Coasters pilot jobs into the queue however we can 12/6/2011 Coasters: uniform resource provisioning 39

40 Implementation: The job packing problem (II) (not to scale)  Sort incoming jobs based on walltime  Partition required space into blocks 12/6/2011 Coasters: uniform resource provisioning 40 # CPUs walltime

41 Implementation: The job packing problem (II) (also not to scale)  Commit jobs to blocks and adjust as necessary based on actual walltime  The actual packing problem is NP-complete  Solved using a greedy algorithm: always pick the largest job that will fit in a block first 12/6/2011 Coasters: uniform resource provisioning 41 # CPUs walltime now … …

42 Use on commercial clouds  Swift/Coasters is easy to deploy on Globus Provisioning (GP)  GP provides simple start-up scripts and several other features we may use in the future (NFS, VM image, CHEF)  Usage: 0. Authenticate to AWS 1. start-coaster-service gp-instance-create gp-instance-start 2. swift myscript.swift … Repeat as necessary 3. stop-coaster-service gp-instance-terminate 12/6/2011 Coasters: uniform resource provisioning 42 Submit site GP+AWS Coaster Pilot Job Coaster Service Coaster Pilot Job NFS Swift

43 NAMD - Replica Exchange Method  Original JETS use case- sizeable batch of short parallel jobs with data exchange  Method extracts information about a complex molecular system through an ensemble of concurrent, parallel simulation tasks 09/13/2011 43 JETS Application parameters (approx.): 64 concurrent jobs x 256 cores per job = 16,384 cores 10-100 time steps per job = 10-60 seconds wall time Requires 6.4 MPI executions/sec. → 1,638 processes/sec. over a 12-hour period = 70 million process starts

44 Performance challenges for large batches  For small application run times, the cost of application start-up, small I/O, library searches, etc. is expensive  Existing HPC schedulers do not support this mode of operation –On the Blue Gene/P, job start takes 2-4 minutes –On the Cray, aprun job start takes a full second or so –Neither of these systems allow the user to make a fine-grained selection of cores from the allocation for small multicore/multinode jobs  Solution pursued by JETS: –Allocate worker agents en masse –Use a specialized user scheduler to rapidly submit user work to agents –Support dynamic construction of multinode MPI applications 09/13/2011 JETS 44

45 JETS: Features  Portable worker agents that run on compute nodes –Provides scripts to launch agents on common systems –Features provide convenient access to local storage such as BG/P ZeptoOS RAM filesystem. Storing application binary, libraries, etc. here results in significant application start time improvements  Central user scheduler to manage workers: (Stand-alone JETS or Coasters discussed on following slides)  MPICH /Hydra modification to allow “launcher=manual”: tasks launched by the user (instead of SSH or other method)  User scheduler plug-in to manage a local call to mpiexec –Processes output from mpiexec over local IPC, launches resultant single tasks on workers –Single tasks are able to find the mpiexec process and each other to start the user job (via Hydra proxy functionality) –Can efficiently manage many running mpiexec processes 09/13/2011 JETS 45

46 Execution infrastructure - JETS  Stand-alone JETS: a high task rate parallel-task launcher –User deploys worker agents via customizable, provided submit scripts 09/13/2011 46 JETS –Currently runs on clusters, grids, and HPC systems –Great over SSH –Runs on the BG/P through ZeptoOS sockets- great for debugging, performance studies, ensembles –Faster than Coasters but provides fewer features –Input must be a flat list of command lines –Limited data access features

47 NAMD/JETS - Parameters  NAMD REM-like case: Tasks average just over 100 seconds 09/13/2011 JETS 47  ZeptoOS sockets on the BG/P 90% efficiency for large messages 50% efficiency for small messages Case provided by Wei Jiang

48 JETS - Task rates and utilization  Calibration: Sequential performance on synthetic jobs: 09/13/2011 JETS 48  Utilization for REM-like case: not quite 90%

49 NAMD/JETS load levels  Allocation size: 512 nodes 09/13/2011 JETS 49  Allocation size: 1024 nodes  Load dips occur during exchange & restart

50 JETS - Misc. results  Effective for short MPI jobs on clusters  Single-second duration jobs on Breadboard cluster 09/13/2011 JETS 50  JETS can survive the loss of worker agents (BG/P)

51 JETS/Coasters  JETS features were integrated with Coasters/Swift for elegant description of workflows containing calls to MPI programs 09/13/2011 JETS 51  The Coasters service manages workers as normal but was extended to aggregate worker cores for MPI execution  Re-uses features from stand-alone JETS  Enables dynamic allocation of MPI jobs; highly portable to BG/P, Cray, clusters, grids, clouds, etc.  Relatively efficient but slower than stand-alone mode

52 NAMD REM in Swift  Constructed SwiftScript to implement REM in NAMD –Whole script is about 100 lines –Intended to substitute for multi-thousand line Python script (that is incompatible with the BG/P)  Script core structures shown to the right  Represents REM data flow from previous slide as Swift data items, statements, and loops 09/13/2011 JETS 52 app (positions p_out, velocities v_out, energies e_out) namd(positions p_in, velocities v_in) { namd @p_out @v_out @p_in @v_in stdout=@e_out; } positions p[] ; velocities v[] ; energies e[] ; // Initialize first segment in each replica foreach i in [0:replicas-1] { int index = i*exchanges; p[i] = initial_positions(); v[i] = initial_velocities(); } // Launch data-dependent NAMDs… iterate j { foreach i in [0:replicas-1] { int current = i*exchanges + j+1; int previous = i*exchanges + j; (p[current], v[current], e[current]) = namd(p[previous], v[previous]); } } until (j == exchanges);

53 ExM: Extreme-scale many-task computing 53 09/13/2011 JETS  The JETS work focused on performance issues in parallel task distribution and launch –Task generation (dataflow script execution) is still a single node process –To run large instances of the script, a distributed memory dataflow system is required  Project goal: Investigate many-task computing on exascale systems –Ease of development – fast route to exaflop application –Investigate alternative programming models –Highly usable programming model: natural concurrency, fault-tolerance –Support scientific use cases: batches, scripts, experiment suites, etc.  Build on and integrate previous successes –ADLB: Task distributor –MosaStore: Filesystem cache –Swift language: Natural concurrency, data specification, etc.

54 Task generation and scalability 54 09/13/2011 JETS  In SwiftScript, all data items are futures  Progress is enabled when data items are closed, enabling dependent statements to execute  Not all variables, statements are known at program start  SwiftScript allows for complex data definitions, conditionals, loops, etc.  Current Swift implementation constrains the data dependency logic to a single node (as do other systems like Dryad, C IEL ) - thus task generation rates are limited  ExM proposes a fully distributed, scalable task generator and dependency graph – built to express Swift semantics and more

55 Performance target 55 09/13/2011 JETS  Need to utilize O(10 9 ) concurrency  For batch of 1000 tasks per core –10 seconds per task –1 hour, 46 minute batch  Tasks : O(10 12 )  Tasks/s: O(10 8 )  Divide cores into workers and control cores –Allocate 0.01% as control cores, O(10 5 ) –Each control core must produce O(10 3 ) = 1000 tasks/second Performance requirements for distributing the work of Swift-like task generation for an ADLB-like task distributor on an example exascale system:

56 Move to distributed evaluation: Swift/T 56 www.ci.uchicago.edu/swift www.mcs.anl.gov/exm Have this:Want this:

57 What are the challenges in applying the many-task model at extreme scale?  Dealing with high task dispatch rates: depends on task granularity but is commonly an obstacle –Load balancing –Global and scoped access to script variables: increasing locality  Handling tasks that are themselves parallel programs –We focus on OpenMP, MPI, and Global Arrays  Data management for intermediate results –Object based abstractions –POSIX-based abstractions –Interaction between data management and task placement 57

58 Parallel evaluation of Swift/T in ExM 58 Big picture Software components Run time configuration

59 Fully parallel evaluation of complex scripts 59 int X = 100, Y = 100; int A[][]; int B[]; foreach x in [0:X-1] { foreach y in [0:Y-1] { if (check(x, y)) { A[x][y] = g(f(x), f(y)); } else { A[x][y] = 0; } B[x] = sum(A[x]); }

60 Swift/T: Basic scaling performance  Swift/T synthetic app: simple task loop –Scales to capacity of ADLB  SciSolSim: collaboration graph modeling –~400 lines Swift –C++ graph model; simulated annealing –Scales well to 4K cores in initial tests (further tests awaiting app debugging)

61 Swift/T: PIPS scaling results  Scaling result for original application: Limited by available data from PIPS team  Scaling result for current application with parameter sweep over rounding parameter (integer programming) 61 “One major advantage of Swift is that its high-level programming language considerably shortens coding times, hence it allows more effort to be dedicated to algorithmic developments. Remarkably, using a scripting language has virtually no impact on the solution times and scaling efficiency, as recent Swift/PIPS-L runs show.” - Cosmin Petra

62 Swift/T: Use of low-level features  Variable-sized tasks produce trailing tasks: addressed by exposing ADLB task priorities at language level

63 FOR MORE INFORMATION and TOOL DOWNLOADS:  To try Swift: http://www.ci.uchicago.edu/swift  ParVisProject: http://trac.mcs.anl.gov/projects/parvis  CESM Diagnostic script info at http://www.cesm.ucar.edu  Pagoda project link: https://svn.pnl.gov/gcrm/wiki/Pagoda  Extreme-scale Swift: http://www.mcs.anl.gov/exm 63 www.ci.uchicago.edu/swift

64 64 Parallel Computing, Sep 2011 www.ci.uchicago.edu/swift

65 Acknowledgments  Swift is supported in part by NSF grants OCI-1148443 and PHY-636265, NIH DC08638, and the UChicago SCI Program  Extreme scaling research on Swift (ExM project) is supported by the DOE Office of Science, ASCR Division  ParVis is sponsored by the DOE Office of Biological and Environment Research, Office of Science  The Swift team: –Mihael Hategan, Tim Armstrong, David Kelly, Ian Foster, Dan Katz, Mike Wilde, Zhao Zhang  Sincere thanks to our scientific collaborators whose usage of Swift was described and acknowledged in this talk 65 www.ci.uchicago.edu/swift


Download ppt "Swift: A language for scalable scientific computing Justin M. Wozniak Swift project: Presenter contact:"

Similar presentations


Ads by Google