CyberShake Study 18.8 Planning

Slides:



Advertisements
Similar presentations
Service Level Agreement Based Scheduling Heuristics Rizos Sakellariou, Djamila Ouelhadj.
Advertisements

1 High Performance Computing at SCEC Scott Callaghan Southern California Earthquake Center University of Southern California.
CyberShake Study 14.2 Technical Readiness Review.
Ewa Deelman, Pegasus and DAGMan: From Concept to Execution Mapping Scientific Workflows onto the National.
Authors: Weiwei Chen, Ewa Deelman 9th International Conference on Parallel Processing and Applied Mathmatics 1.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
NSF Geoinformatics Project (Sept 2012 – August 2014) Geoinformatics: Community Computational Platforms for Developing Three-Dimensional Models of Earth.
1.UCERF3 development (Field/Milner) 2.Broadband Platform development (Silva/Goulet/Somerville and others) 3.CVM development to support higher frequencies.
CyberShake Study 15.4 Technical Readiness Review.
Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.
CyberShake Study 2.3 Technical Readiness Review. Study re-versioning SCEC software uses year.month versioning Suggest renaming this study to 13.4.
Fig. 1. A wiring diagram for the SCEC computational pathways of earthquake system science (left) and large-scale calculations exemplifying each of the.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems with Multi-programming Chapter 4.
APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host.
SAN DIEGO SUPERCOMPUTER CENTER Inca Control Infrastructure Shava Smallen Inca Workshop September 4, 2008.
CyberShake Study 15.3 Science Readiness Review. Study 15.3 Scientific Goals Calculate a 1 Hz map of Southern California Produce meaningful 2 second results.
Southern California Earthquake Center CyberShake Progress Update 3 November 2014 through 4 May 2015 UGMS May 2015 Meeting Philip Maechling SCEC IT Architect.
Experiences Running Seismic Hazard Workflows Scott Callaghan Southern California Earthquake Center University of Southern California SC13 Workflow BoF.
Southern California Earthquake Center SCEC Collaboratory for Interseismic Simulation and Modeling (CISM) Infrastructure Philip J. Maechling (SCEC) September.
Funded by the NSF OCI program grants OCI and OCI Mats Rynge, Gideon Juve, Karan Vahi, Gaurang Mehta, Ewa Deelman Information Sciences Institute,
1 1.Used AWP-ODC-GPU to run 10Hz Wave propagation simulation with rough fault rupture in half-space with and without small scale heterogeneities. 2.Used.
Rapid Centroid Moment Tensor (CMT) Inversion in 3D Earth Structure Model for Earthquakes in Southern California 1 En-Jui Lee, 1 Po Chen, 2 Thomas H. Jordan,
Southern California Earthquake Center CyberShake Progress Update November 3, 2014 – 4 May 2015 UGMS May 2015 Meeting Philip Maechling SCEC IT Architect.
1 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Part IV Workflow Mapping and Execution in Pegasus (Thanks.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
SCEC CyberShake on TG & OSG: Options and Experiments Allan Espinosa*°, Daniel S. Katz*, Michael Wilde*, Ian Foster*°,
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Recent TeraGrid Visualization Support Projects at NCSA Dave.
Overview of Scientific Workflows: Why Use Them?
CyberShake Study 2.3 Readiness Review
CyberShake Study 16.9 Science Readiness Review
The SCEC CSEP TESTING Center Operations Review
Simplify Your Science with Workflow Tools
OpenPBS – Distributed Workload Management System
Dynamic Deployment of VO Specific Condor Scheduler using GT4
Simplify Your Science with Workflow Tools
Seismic Hazard Analysis Using Distributed Workflows
Meeting Objectives Discuss proposed CISM structure and activities
Scott Callaghan Southern California Earthquake Center
Architecture & System Overview
CyberShake Study 16.9 Discussion
Philip J. Maechling (SCEC) September 13, 2015
CPU scheduling 6. Schedulers, CPU Scheduling 6.1. Schedulers
Operating Systems CPU Scheduling.
Pegasus and Condor Gaurang Mehta, Ewa Deelman, Carl Kesselman, Karan Vahi Center For Grid Technologies USC/ISI.
Chapter 6: CPU Scheduling
COT 4600 Operating Systems Spring 2011
University of Southern California
Managing Computational Workflows in the Cloud
CyberShake Study 17.3 Science Readiness Review
ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC
Pegasus Workflows on XSEDE
Chapter 5: CPU Scheduling
Process Scheduling B.Ramamurthy 2/23/2019.
CyberShake Study 2.2: Science Review Scott Callaghan 1.
Process Scheduling B.Ramamurthy 4/19/2019.
Overview of Workflows: Why Use Them?
Process Scheduling B.Ramamurthy 4/24/2019.
rvGAHP – Push-Based Job Submission Using Reverse SSH Connections
High Throughput Computing for Astronomers
Chapter 5: CPU Scheduling
CyberShake Study 14.2 Science Readiness Review
Process Scheduling B.Ramamurthy 5/7/2019.
CPU Scheduling: Basic Concepts
Southern California Earthquake Center
Southern California Earthquake Center
CyberShake Study 18.8 Technical Readiness Review
CyberShake Study 2.2: Computational Review Scott Callaghan 1.
CPU Scheduling.
CPU Scheduling: Basic Concepts
Chapter 5: CPU Scheduling
Presentation transcript:

CyberShake Study 18.8 Planning Scott Callaghan Southern California Earthquake Center

Southern California Earthquake Center Study 18.8 Goals Perform physics-based seismic hazard analysis in the extended San Francisco Bay Area Will perform CyberShake calculations for 869 locations New combination of velocity models Area of overlap with previous studies to examine model impact We anticipate this will be the largest CyberShake study to date Southern California Earthquake Center 8/15/2019

Southern California Earthquake Center Execution Plan CyberShake consists of two parts: Strain Green Tensor (SGT) workflow Dominated by 2 MPI GPU jobs, 40-80 minutes on 800 nodes 48K SUs total Post-processing workflow Dominated by master/worker MPI CPU job, 2-14 hours on 240 nodes More heterogeneity than in previous studies due to volume sizes In 2017, ran both workflows on both Blue Waters and Titan Plan to use similar plan for 2018 Goals are to minimize makespan and human involvement Southern California Earthquake Center 8/15/2019

Southern California Earthquake Center Execution Plan, cont. Blue Waters allocation expiring on 8/31: plan to run 75% on Blue Waters and 25% on Titan 25.1M SUs on Titan, 51% of remaining 2018 allocation Will modify depending on throughput To support automated remote execution, will use rvgahp approach, same method as 2017 Daemon runs on Titan login node Makes SSH connection back to SCEC workflow submission host Listens for job submissions, translates them, and inserts them into batch queue Estimated duration of 4 weeks Data transfer 54 TB Titan SGTs archived on Blue Waters 3 TB output data transferred to SCEC storage Southern California Earthquake Center 8/15/2019

Southern California Earthquake Center Staff SCEC Staff Scott Callaghan Phil Maechling Christine Goulet Mei-Hui Su Kevin Milner John Yu Pegasus Staff Karan Vahi Mats Rynge OLCF Staff Judy Hill Southern California Earthquake Center 8/15/2019

Southern California Earthquake Center Requests Longer runtimes for 240-node jobs Post-processing jobs may take up to 14 hours Since max queue time for 240-node jobs is 6 hours, must restart from checkpoint 1-2 hours runtime lost due to restart Disk usage OK Cumulative disk usage of 200 TB; will clean up as we go Changes from 2017 study for increased throughput 5-day priority boost 5 jobs running in bin 5 8 jobs eligible to run Scheduled downtime? Southern California Earthquake Center 8/15/2019