October 23, 2015 1 Grid Computing: From Old Traces to New Applications Fribourg, Switzerland Alexandru Iosup, Ozan Sonmez, Nezih Yigitbasi, Hashim Mohamed,

October 23, 2015 1 Grid Computing: From Old Traces to New Applications Fribourg, Switzerland Alexandru Iosup, Ozan Sonmez, Nezih Yigitbasi, Hashim Mohamed, Catalin Dumitrescu, Mathieu Jan, Dick Epema Parallel and Distributed Systems Group, TU Delft Big thanks to our collaborators: U Wisc./Madison, U Chicago, U Dortmund, U Innsbruck, LRI/INRIA Paris, INRIA Grenoble, U Leiden, Politehnica University of Bucharest, Technion, … DGSim The Failure Trace Archive

About the Speaker Systems The Koala grid scheduler The Tribler BitTorrent-compatible P2P file-sharing The POGGI and CAMEO gaming platforms Performance The Grid Workloads Archive (Nov 2006) The Failure Trace Archive (Nov 2009) The Peer-to-Peer Trace Archive (Apr 2010) Tools: DGSim trace-based grid simulator, GrenchMark workload- based grid benchmarking Team of 15+ active collaborators in NL, AT, RO, US Happy to be in Berkeley until September October 23, 2015 2

The Grid An ubiquitous, always-on computational and data storage platform on which users can seamlessly run their (large-scale) applications October 23, 2015 3 Shared capacity & costs, economies of scale

October 23, 2015 4 The Dutch Grid: DAS System and Extensions VU (85 nodes) TU Delft (68) Leiden (32) SURFnet6 10 Gb/s lambdas UvA/MultimediaN (46) UvA/VL-e (41) 272 AMD Opteron nodes 792 cores, 1TB memory Heterogeneous: 2.2-2.6 GHz single/dual core nodes Myrinet-10G (excl. Delft) Gigabit Ethernet DAS-4 (upcoming) Multi-cores: general purpose, GPU, Cell, … DAS-3: a 5-cluster grid Clouds Amazon EC2+S3, Mosso, …

October 23, 2015 5 Many Grids Built DAS, Grid’5000, OSG, NGS, CERN, … Why grids and not The Grid?

October 23, 2015 6 Agenda 1.Introduction 2.Was it the System? 3.Was it the Workload? 4.Was it the System Designer? 5.New Application Types 6.Suggestions for Collaboration 7.Conclusion

October 23, 2015 7 The Failure Trace Archive Failure and Recovery Events 20+ traces online http://fta.inria.fr D. Kondo, B. Javadi, A. Iosup, D. Epema, The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems, CCGrid 2010 (Best Paper Award)

Was it the System? No System can grow fast Good data and models to support system designers Yes Grid middleware unscalable [CCGrid06,HPDC09] Grid middleware failure-prone [CCGrid07,Grid07] Grid resources unavailable [CCGrid10] Inability to load balance well [SC|07] Poor online information about resource availability October 23, 2015 8

9 Agenda 1.Introduction 2.Was it the System? 3.Was it the Workload? 4.Was it the System Designer? 5.New Application Types 6.Suggestions for Collaboration 7.Conclusion

October 23, 2015 10 The Grid Workloads Archive Per-Job Arrival, Start, Stop, Structure, etc. 6 traces online http://gwa.ewi.tudelft.nl 1.5 yrs>750K>250 A. Iosup, H. Li, M. Jan, S. Anoep, C. Dumitrescu, L. Wolters, D. Epema, The Grid Workloads Archive, FGCS 24, 672—686, 2008.

October 23, 2015 11 Grid Systems How Are Real Grids Used? Data Analysis and Modeling Grids vs. parallel production environments such as clusters and (small) supercomputers Bags of single-processor tasks vs. single parallel jobs Bigger bursts of job arrivals More jobs Parallel production environments

October 23, 2015 12 Bags-of-Tasks (BoTs) Grid Workloads Analysis: Grid Workload Components Time [units] Workflows (WFs) BoT size = 2-70 tasks, most 5-20 Task runtime highly variable, from minutes to tens of hours WF size = 2-1k tasks, most 30-40 Task runtime of minutes

Was it the Workload? No Similar workload characteristics across grids High utilization possible due to single-node jobs High load imbalance Good data and models to support system designers [Grid06,EuroPar08,HPDC08-10,FGCS08] Yes Too many tasks (system limitation) Poor online information about job characteristics + High variability of job resource requirements How to schedule BoTs, WFs, mixtures in grids? October 23, 2015 13

October 23, 2015 15 Problems in Grid Scheduling and Resource Management The System 1.Grid schedulers do not own resources themselves They have to negotiate with autonomous local schedulers Authentication/multi-organizational issues 2.Grid schedulers interface to local schedulers Some may have support for reservations, others are queuing-based 3.Grid resources are heterogeneous and dynamic Hardware (processor architecture, disk space, network) Basic software (OS, libraries) Grid software (middleware) Resources may fail Lack of complete and accurate resource information

October 23, 2015 16 Problems in Grid Scheduling and Resource Management The Workloads 4.Workloads are heterogeneous and dynamic Grid schedulers may not have control over the full workload (multiple submission points) Jobs may have performance requirements Lack of complete and accurate job information 5.Application structure is heterogeneous Single sequential job Bags of Tasks; parameter sweeps (Monte Carlo), pilot jobs Workflows, pipelines, chains-of-tasks Parallel jobs (MPI); malleable, coallocated

October 23, 2015 17 The Koala Grid Scheduler Developed in the DAS system Has been deployed on the DAS-2 in September 2005 Ported to DAS-3 in April 2007 Independent from grid middlewares such as Globus Runs on top of local schedulers Objectives: Data and processor co-allocation in grids Supporting different application types Specialized application-oriented scheduling policies Koala homepage: http://www.st.ewi.tudelft.nl/koala/

October 23, 2015 18 Koala in a Nutshell Parallel Applications MPI, Ibis,… Co-Allocation Malleability Parameter Sweep Applications Cycle Scavenging Run as low-priority jobs Workflows A bridge between theory and practice

Euro-Par 2008, Las Palmas, 27 August 2008 19 Inter-Operating Grids Through Delegated MatchMaking Inter-Operation Architectures Hybrid hierarchical/ decentralized Decentralized Hierarchical IndependentCentralized Delegated MatchMaking

October 23, 2015 20 Inter-Operating Grids Through Delegated MatchMaking The Delegated MatchMaking Mechanism 1.Deal with local load locally (if possible) 2.When local load is too high, temporarily bind resources from remote sites to the local environment. May build delegation chains. Delegate resource usage rights, do not migrate jobs. 3.Deal with delegations each delegation cycle (delegated matchmaking) Delegate Local load too high Resource request Resource usage rights Bind remote resource The Delegated MatchMaking Mechanism= Delegate Resource Usage Rights, Do Not Delegate Jobs

October 23, 2015 21 DMM High goodput Low wait time Finishes all jobs Even better for load imbalance between grids Reasonable overhead [see thesis] What is the Potential Gain of Grid Inter-Operation? Delegated MatchMaking vs. Alternatives Independent Centralized Decentralized DMM (Higher is better) Grid Inter-Operation (through DMM) delivers good performance

October 23, 2015 22 4.2. Studies on Grid Scheduling [5/5] Scheduling under Cycle Stealing Scheduler CS-Runner Node submits PSA(s) JDF grow/shrink messages registers Clusters Launcher Head Node KCM submits launchers deploys, monitors, and preempts tasks monitors/informs idle/demanded resources CS Policies: Equi-All: grid-wide basis Equi-PerSite: per cluster CS Policies: Equi-All: grid-wide basis Equi-PerSite: per cluster Application Level Scheduling: Pull-based approach Shrinkage policy Application Level Scheduling: Pull-based approach Shrinkage policy Launcher O. Sonmez, B. Grundeken, H. Mohamed, A. Iosup, D. Epema: Scheduling Strategies for Cycle Scavenging in Multicluster Grid Systems. CCGRID 2009: 12-19 Requirements 1. Unobtrusiveness Minimal delay for (higher priority) local and grid jobs 2. Fairness 3. Dynamic Resource Allocation 4. Efficiency 5. Robustness and Fault Tolerance Deployed as Koala Runner

Was it the System Designer? No Mechanisms to inter-operate grids: DMM [SC|07], … Mechanisms to run many grid application types: WFs, BoTs, parameter sweeps, cycle scavenging, … Scheduling algorithms with inaccurate information [HPDC ‘08, ‘09, ‘10] Tools for empirical and trace-based experimentation Yes Still too many tasks What about new application types? October 23, 2015 23

Cloud Futures Workshop 2010 – Cloud Computing Support for Massively Social Gaming 25 MSGs are a Popular, Growing Market 25,000,000 subscribed players (from 150,000,000+ active) Over 10,000 MSGs in operation Market size 7,500,000,000$/year Sources: MMOGChart, own research.Sources: ESA, MPAA, RIAA.

Cloud Futures Workshop 2010 – Cloud Computing Support for Massively Social Gaming 26 Massively Social Gaming as New Grid/Cloud Application 1.Virtual world Explore, do, learn, socialize, compete + 2.Content Graphics, maps, puzzles, quests, culture + 3.Game analytics Player stats and relationships Romeo and Juliet Massively Social Gaming (online) games with massive numbers of players (100K+), for which social interaction helps the gaming experience [SC|08, TPDS’10] [EuroPar09 BPAward, CPE10] [ROIA09]

October 23, 2015 27 Suggestions for Collaboration Scheduling mixtures of grid/HPC/cloud workloads Scheduling and resource management in practice Modeling aspects of cloud infrastructure and workloads Condor on top of Mesos Massively Social Gaming and Mesos Step 1: Game analytics and social network analysis in Mesos The Grid Research Toolbox Using and sharing traces: The Grid Workloads Archive and The Failure Trace Archive GrenchMark: testing large-scale distributed systems DGSim: simulating multi-cluster grids

October 23, 2015 28 Alex Iosup, Ozan Sonmez, Nezih Yigitbasi, Hashim Mohamed, Dick Epema Thank you! Questions? Observations? More Information: The Koala Grid Scheduler: www.st.ewi.tudelft.nl/koalawww.st.ewi.tudelft.nl/koala The Grid Workloads Archive: gwa.ewi.tudelft.nlgwa.ewi.tudelft.nl The Failure Trace Archive: fta.inria.frfta.inria.fr The DGSim simulator: www.pds.ewi.tudelft.nl/~iosup/dgsim.phpwww.pds.ewi.tudelft.nl/~iosup/dgsim.php The GrenchMark perf. eval. tool: grenchmark.st.ewi.tudelft.nlgrenchmark.st.ewi.tudelft.nl Cloud research: www.st.ewi.tudelft.nl/~iosup/research_cloud.htmlwww.st.ewi.tudelft.nl/~iosup/research_cloud.html Gaming research: www.st.ewi.tudelft.nl/~iosup/research_gaming.htmlwww.st.ewi.tudelft.nl/~iosup/research_gaming.html see PDS publication database at: www.pds.twi.tudelft.nl/www.pds.twi.tudelft.nl/ email: A.Iosup@tudelft.nl Big thanks to our collaborators: U. Wisc.-Madison, U Chicago, U Dortmund, U Innsbruck, LRI/INRIA Paris, INRIA Grenoble, U Leiden, Politehnica University of Bucharest, Technion, … DGSim

October 23, 2015 29 The 1M-CPU Machine with Shared Resource Ownership The 1M-CPU machine eScience (high-energy physics, earth sciences, financial services, bioinformatics, etc.) Shared resource ownership Shared resource acquisition Shared maintenance and operation Summed capacity higher (more efficiently used) than sum of individual capacities

October 23, 2015 30 How to Build the 1M-CPU Machine with Shared Resource Ownership? Clusters of resources are ever more present Top500 SuperComputers: cluster systems from 0% to 75% share in 10 years (also from 0% to 50% performance) CERN WLCG: from 100 to 300 clusters in 2½ years Source: http://www.top500.org/overtime/list/29/archtype/ http://www.top500.org/overtime/list/29/archtype/ Source: http://goc.grid.sinica.edu.tw/gstat//table.html http://goc.grid.sinica.edu.tw/gstat//table.html

October 23, 2015 31 How to Build the 1M-CPU Machine with Shared Resource Ownership? Median: 10x Averge: 20x Max:100x Last 10 years Data source: http://www.top500.orghttp://www.top500.org Last 4 years Now:0.5x/yr To build the 1M-CPU cluster: - At last 10 years rate, another 10 years - At current rate, another 200 years

October 23, 2015 32 How to Build the 1M-CPU Machine with Shared Resource Ownership? Cluster-based Computing Grids CERN’s WLCG cluster size over time Median: +5 procs/yr Avg: +15 procs/yr Max: 2x/yr Shared clusters grow on average slower than Top500 cluster systems! Data source: http://goc.grid.sinica.edu.tw/gstat/http://goc.grid.sinica.edu.tw/gstat/ Year 1Year 2

October 23, 2015 33 How to Build the 1M-CPU Machine with Shared Resource Ownership? Physics Dissipate heat from large clusters Market Pay industrial power consumer rate, pay special system building rate Collaboration Who pays for the largest cluster? We don’t know how to exploit multi-cores yet Executing large batches of independent jobs Why doesn’t CERN WLCG use larger clusters? Why doesn’t CERN WLCG opt for multi-cores?

Cloud Futures Workshop 2010 – Cloud Computing Support for Massively Social Gaming 34 Massively Social Gaming on Clouds Current Technology The Future Happy players Happy cloud operators Million-user, multi-bn market Content, World Sim, Analytics MSGs Upfront payment Cost and scalability problems Makes players unhappy Our Vision Scalability & Automation Economy of scale with clouds Ongoing Work Content: POGGI Framework Platform: edutain@grid Analytics: CAMEO Framework Publications Gaming and Clouds 2008: ACM SC, TR Perf 2009: ROIA, CCGrid, NetGames, EuroPar (Best Paper Award), CloudComp, TR variability 2010: IEEE TPDS, Elsevier CCPE 2011: Book Chapter CAMEO Graduation Forecast 2010/2011: 1PhD, 2Msc, 4BSc

October 23, 2015 35 Selected Findings Batches predominant in grid workloads; up to 96% CPUTime Average batch size (Δ≤120s) is 15-30 (500 max) 75% of the batches are sized 20 jobs or less 4.1. Grid Workloads [2/5] BoTs are predominant in grids A. Iosup, M. Jan, O. Sonmez, and D.H.J. Epema, The Characteristics and Performance of Groups of Jobs in Grids, Euro-Par, LNCS, vol.4641, pp. 382-393, 2007. Grid’5000NorduGridGLOW (Condor) Submissions26k50k13k Jobs808k (951k)738k (781k)205k (216k) CPU time193y (651y)2192y (2443y)53y (55y)

Euro-Par 2008, Las Palmas, 27 August 2008 36 System Availability Characteristics Resource Evolution: Grids Grow by Cluster

October 23, 2015 37 System Availability Characteristics Grid Dynamics: Grids Shrink Temporarily Grid-level view Average availability: 69%

October 23, 2015 38 Resource Availability Model Assume no correlation of failure occurrence between clusters Which site/cluster? f s, fraction of failures at cluster s MTBF MTTR Correl. Weibull distribution for IAT the longer a node is online, the higher the chances that it will fail

October 23, 2015 39 Grid Workloads Load Imbalance Across Sites and Grids Overall workload imbalance: normalized daily load (5:1) Temporary workload imbalance: hourly load (1000:1) Overall imbalance Temporary imbalance

October 23, 2015 40 Adapted to grids: percentage parallel jobs, other values. Validated with 4 grid and 7 parallel production env. traces 4.1. Grid Workloads [4/5] Modeling Grid Workloads: Feitelson adapted A. Iosup, T. Tannenbaum, M. Farrellee, D. Epema, M. Livny: Inter-operating grids through Delegated MatchMaking. SC|07 (Nominated for Best Paper Award)

October 23, 2015 41 Single arrival process for both BoTs and parallel jobs Reduce over-fitting and complexity of “Feitelson adapted” by removing the RunTime-Parallelism correlated model Validated with 7 grid workloads Grid Workloads Modeling Grid Workloads: adding users, BoTs A. Iosup, O. Sonmez, S. Anoep, and D.H.J. Epema. The Performance of Bags-of-Tasks in Large-Scale Distributed Systems, HPDC, pp. 97-108, 2008.

October 23, 2015 42 How To Compare Existing and New Grid Systems? The Delft Grid Simulator (DGSim) DGSim…tudelft.nl/~iosup/dgsim.php Discrete event generator Generate realistic workloads Automate simulation process (10,000s of tasks)

October 23, 2015 43 How to Inter-Operate Grids? Existing (, Working?) Alternatives IndependentCentralized Hierarchical Decentralized Condor Globus GRAM Alien Koala OAR CCS Moab/Torque OAR2 NWIRE OurGrid Condor Flocking Load imbalance? Resource selection? Scale? Root ownership? Node failures? Accounting? Trust? Scale?

October 23, 2015 44 3 3 3 333 2 Inter-Operating Grids Through Delegated MatchMaking [1/3] The Delegated MatchMaking Architecture 1.Start from a hierarchical architecture 2.Let roots exchange load 3.Let siblings exchange load Delegated MatchMaking Architecture= Hybrid hierarchical/decentralized architecture for grid inter-operation

October 23, 2015 45 6.Clouds Large-scale, loosely coupled infrastructure and/or platform Computation and storage has fixed costs (?) Guaranteed good performance, e.g., no wait time (?) Easy to port grid applications to clouds (?) 7.Multi-cores Small- and mid-scale, tightly-coupled infrastructure Computation and storage has lower cost than grid (?) Good performance (?) Easy to port grid applications to multi-cores (?) Problems in Grid Scheduling and Resource Management New Hypes, New Focus for Designers

October 23, 2015 1 Grid Computing: From Old Traces to New Applications Fribourg, Switzerland Alexandru Iosup, Ozan Sonmez, Nezih Yigitbasi, Hashim Mohamed,

Similar presentations

Presentation on theme: "October 23, 2015 1 Grid Computing: From Old Traces to New Applications Fribourg, Switzerland Alexandru Iosup, Ozan Sonmez, Nezih Yigitbasi, Hashim Mohamed,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

October 23, 2015 1 Grid Computing: From Old Traces to New Applications Fribourg, Switzerland Alexandru Iosup, Ozan Sonmez, Nezih Yigitbasi, Hashim Mohamed,

Similar presentations

Presentation on theme: "October 23, 2015 1 Grid Computing: From Old Traces to New Applications Fribourg, Switzerland Alexandru Iosup, Ozan Sonmez, Nezih Yigitbasi, Hashim Mohamed,"— Presentation transcript:

Similar presentations

About project

Feedback