Experiences Running Seismic Hazard Workflows Scott Callaghan Southern California Earthquake Center University of Southern California SC13 Workflow BoF.

Slides:



Advertisements
Similar presentations
Grid Resource Allocation Management (GRAM) GRAM provides the user to access the grid in order to run, terminate and monitor jobs remotely. The job request.
Advertisements

1 OBJECTIVES To generate a web-based system enables to assemble model configurations. to submit these configurations on different.
The ADAMANT Project: Linking Scientific Workflows and Networks “Adaptive Data-Aware Multi-Domain Application Network Topologies” Ilia Baldine, Charles.
A Computation Management Agent for Multi-Institutional Grids
1 High Performance Computing at SCEC Scott Callaghan Southern California Earthquake Center University of Southern California.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
Sun Grid Engine Grid Computing Assignment – Fall 2005 James Ruff Senior Department of Mathematics and Computer Science Western Carolina University.
Intermediate HTCondor: Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
CyberShake Study 14.2 Technical Readiness Review.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.
Authors: Weiwei Chen, Ewa Deelman 9th International Conference on Parallel Processing and Applied Mathmatics 1.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
High Performance Louisiana State University - LONI HPC Enablement Workshop – LaTech University,
NSF Geoinformatics Project (Sept 2012 – August 2014) Geoinformatics: Community Computational Platforms for Developing Three-Dimensional Models of Earth.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
NeSC Apps Workshop July 20 th, 2002 Customizable command line tools for Grids Ian Kelley + Gabrielle Allen Max Planck Institute for Gravitational Physics.
1.UCERF3 development (Field/Milner) 2.Broadband Platform development (Silva/Goulet/Somerville and others) 3.CVM development to support higher frequencies.
CyberShake Study 15.4 Technical Readiness Review.
Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.
CyberShake Study 2.3 Technical Readiness Review. Study re-versioning SCEC software uses year.month versioning Suggest renaming this study to 13.4.
Geosciences - Observations (Bob Wilhelmson) The geosciences in NSF’s world consists of atmospheric science, ocean science, and earth science Many of the.
Fig. 1. A wiring diagram for the SCEC computational pathways of earthquake system science (left) and large-scale calculations exemplifying each of the.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
Pegasus: Running Large-Scale Scientific Workflows on the TeraGrid Ewa Deelman USC Information Sciences Institute
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
© Geodise Project, University of Southampton, Geodise Middleware & Optimisation Graeme Pound, Hakki Eres, Gang Xue & Matthew Fairman Summer 2003.
1 Chapter Overview Defining Operators Creating Jobs Configuring Alerts Creating a Database Maintenance Plan Creating Multiserver Jobs.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Sep 13, 2006 Scientific Computing 1 Managing Scientific Computing Projects Erik Deumens QTP and HPC Center.
1 The EDIT System, Overview European Commission – Eurostat.
CyberShake Study 15.3 Science Readiness Review. Study 15.3 Scientific Goals Calculate a 1 Hz map of Southern California Produce meaningful 2 second results.
© Geodise Project, University of Southampton, Geodise Middleware Graeme Pound, Gang Xue & Matthew Fairman Summer 2003.
PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.
Southern California Earthquake Center CyberShake Progress Update 3 November 2014 through 4 May 2015 UGMS May 2015 Meeting Philip Maechling SCEC IT Architect.
Funded by the NSF OCI program grants OCI and OCI Mats Rynge, Gideon Juve, Karan Vahi, Gaurang Mehta, Ewa Deelman Information Sciences Institute,
Daniele Spiga PerugiaCMS Italia 14 Feb ’07 Napoli1 CRAB status and next evolution Daniele Spiga University & INFN Perugia On behalf of CRAB Team.
Southern California Earthquake Center CyberShake Progress Update November 3, 2014 – 4 May 2015 UGMS May 2015 Meeting Philip Maechling SCEC IT Architect.
SCEC: An NSF + USGS Research Center Focus on Forecasts Motivation.
© Geodise Project, University of Southampton, Workflow Support for Advanced Grid-Enabled Computing Fenglian Xu *, M.
HTCondor’s Grid Universe Jaime Frey Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison.
PEER 2003 Meeting 03/08/031 Interdisciplinary Framework Major focus areas Structural Representation Fault Systems Earthquake Source Physics Ground Motions.
1 An unattended, fault-tolerant approach for the execution of distributed applications Manuel Rodríguez-Pascual, Rafael Mayo-García CIEMAT Madrid, Spain.
Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,
Overview of Scientific Workflows: Why Use Them?
HPC In The Cloud Case Study: Proteomics Workflow
VisIt Project Overview
CyberShake Study 2.3 Readiness Review
HPC In The Cloud Case Study: Proteomics Workflow
Simplify Your Science with Workflow Tools
Overview of the Belle II computing
Simplify Your Science with Workflow Tools
Seismic Hazard Analysis Using Distributed Workflows
Scott Callaghan Southern California Earthquake Center
CyberShake Study 16.9 Discussion
Philip J. Maechling (SCEC) September 13, 2015
Pegasus Workflows on XSEDE
CyberShake Study 2.2: Science Review Scott Callaghan 1.
Overview of Workflows: Why Use Them?
rvGAHP – Push-Based Job Submission Using Reverse SSH Connections
CyberShake Study 14.2 Science Readiness Review
Southern California Earthquake Center
Southern California Earthquake Center
Final Review 27th March Final Review 27th March 2019.
CyberShake Study 18.8 Technical Readiness Review
CyberShake Study 2.2: Computational Review Scott Callaghan 1.
CyberShake Study 18.8 Planning
Presentation transcript:

Experiences Running Seismic Hazard Workflows Scott Callaghan Southern California Earthquake Center University of Southern California SC13 Workflow BoF

Overview of SCEC Computational Problem Domain: Probabilistic Seismic Hazard Analysis (PSHA) PSHA computation defined as workflow with two parts 1.“Strain Green Tensor Workflow” ~10 jobs, largest are 4000 core/hrs Output: Two (2) twenty (20) GB files = 40GB Total 2.“Post-processing Workflow” High throughput ~410,000 serial jobs, runtimes less than 1 min Output: 14,000 files totaling 12 GB Use Pegasus-WMS, HTCondor, Globus Calculates hazard for 1 location

Recent PSHA Research Calculation Using Workflows – CyberShake Study 13.4 PSHA calculation exploring impact of alternative velocity models and alternative codes was defined as 1144 workflows Parallel Strain Green Tensor (SGT) workflow on Blue Waters –No remote job submission support in Spring 2013 –Created basic workflow system using qsub dependencies Post-processing workflow on Stampede –Pegasus-MPI-Cluster to manage serial jobs Single job is submitted to Stampede Master-worker paradigm; work is distributed to workers Relatively little output, so writes were aggregated in master

CyberShake Workflow Performance Metrics April 17, 2013 – June 17, 2013 (62 days) –Running jobs 60% of the time Blue Waters Usage – Parallel SGT calculations: –Average of 19,300 cores, 8 jobs –Average queue time: 401 sec (34% job runtime) Stampede Usage – Many-task post-processing: –Average of 1,860 cores, 4 jobs –470 million tasks executed (177 tasks/sec) –21,912 jobs total –Average queue time: 422 sec (127% job runtime)

Constructing Workflow Capabilities using HPC System Services Our workflows rely on multiple services –Remote gatekeeper –Remote gridftp server –Local gridftp server Currently issues are detected via workflow failure Helpful if there were automated checks that everything is operational Additional factors –Valid X509 proxy –Available disk space

Why Define PSHA Calculations as Multiple Workflows ? CyberShake Study 13.4 used 1144 workflows Too many jobs to merge into 1 workflow Currently, we create 1144 workflows and monitor submission via cronjob

Improving Workflow Capabilities Workflow tools could provide higher-level organization –Specify set of workflows as a group –Workflows are submitted and monitored –Errors detected and group paused –Easy to obtain status, progress –Statistics are aggregated over the group

Summary of SCEC Workflows Needs Workflows tools for large calculations on leadership-class systems that do not support remote job submission –Need capabilities to define job dependencies, manage failure and retry, track files using dbms Light-weight workflow tools that can be included in scientific software releases –For unsophisticated users, the complexity of workflow tools should not outweigh their advantages Methods to define workflows with interactive (people-in-the- loop) stages –Needed to automate computational pathways that have manual stages Methods to gather both scientific and computational metadata into databases –Needed to ensure reproducibility and support user interfaces