DV/dt : Accelerating the Rate of Progress Towards Extreme Scale Collaborative Science Miron Livny (UW), Ewa Deelman (USC/ISI), Douglas Thain (ND), Frank.

Slides:



Advertisements
Similar presentations
University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also.
Advertisements

BXGrid: A Data Repository and Computing Grid for Biometrics Research Hoang Bui University of Notre Dame 1.
Experience with Adopting Clouds at Notre Dame Douglas Thain University of Notre Dame IEEE CloudCom, November 2010.
The ADAMANT Project: Linking Scientific Workflows and Networks “Adaptive Data-Aware Multi-Domain Application Network Topologies” Ilia Baldine, Charles.
Ewa Deelman, Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example Nandita Mangal,
Scaling Up Without Blowing Up Douglas Thain University of Notre Dame.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
1 High Performance Computing at SCEC Scott Callaghan Southern California Earthquake Center University of Southern California.
Deconstructing Clusters for High End Biometric Applications NSF CCF June Douglas Thain and Patrick Flynn University of Notre Dame 5 August.
Cooperative Computing for Data Intensive Science Douglas Thain University of Notre Dame NSF Bridges to Engineering 2020 Conference 12 March 2008.
MultiJob PanDA Pilot Oleynik Danila 28/05/2015. Overview Initial PanDA pilot concept & HPC Motivation PanDA Pilot workflow at nutshell MultiJob Pilot.
Portable Resource Management for Data Intensive Workflows Douglas Thain University of Notre Dame.
Apache Airavata GSOC Knowledge and Expertise Computational Resources Scientific Instruments Algorithms and Models Archived Data and Metadata Advanced.
Authors: Weiwei Chen, Ewa Deelman 9th International Conference on Parallel Processing and Applied Mathmatics 1.
1 An SLA-Oriented Capacity Planning Tool for Streaming Media Services Lucy Cherkasova, Wenting Tang, and Sharad Singhal HPLabs,USA.
OSG Public Storage and iRODS
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Massively Parallel Ensemble Methods Using Work Queue Badi’ Abdul-Wahid Department of Computer Science University of Notre Dame CCL Workshop 2012.
Toward a Common Model for Highly Concurrent Applications Douglas Thain University of Notre Dame MTAGS Workshop 17 November 2013.
Profile Driven Component Placement for Cluster-based Online Services Christopher Stewart (University of Rochester) Kai Shen (University of Rochester) Sandhya.
1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.
Chapter 3 System Performance and Models. 2 Systems and Models The concept of modeling in the study of the dynamic behavior of simple system is be able.
Building Scalable Scientific Applications using Makeflow Dinesh Rajan and Douglas Thain University of Notre Dame.
The Cooperative Computing Lab  We collaborate with people who have large scale computing problems in science, engineering, and other fields.  We operate.
Introduction to Scalable Programming using Work Queue Dinesh Rajan and Ben Tovar University of Notre Dame October 10, 2013.
DV/dt - Accelerating the Rate of Progress towards Extreme Scale Collaborative Science DOE: Scientific Collaborations at Extreme-Scales:
1 Computational Abstractions: Strategies for Scaling Up Applications Douglas Thain University of Notre Dame Institute for Computational Economics University.
Pegasus: Running Large-Scale Scientific Workflows on the TeraGrid Ewa Deelman USC Information Sciences Institute
BOINC: Progress and Plans David P. Anderson Space Sciences Lab University of California, Berkeley BOINC:FAST August 2013.
Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Introduction to Scalable Programming using Work Queue Dinesh Rajan and Mike Albrecht University of Notre Dame October 24 and November 7, 2012.
CyberShake Study 15.3 Science Readiness Review. Study 15.3 Scientific Goals Calculate a 1 Hz map of Southern California Produce meaningful 2 second results.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Southern California Earthquake Center CyberShake Progress Update 3 November 2014 through 4 May 2015 UGMS May 2015 Meeting Philip Maechling SCEC IT Architect.
Experiences Running Seismic Hazard Workflows Scott Callaghan Southern California Earthquake Center University of Southern California SC13 Workflow BoF.
User Scenarios in VENUS-C Focus on Structural Analysis Ignacio Blanquer I3M - UPV.
Funded by the NSF OCI program grants OCI and OCI Mats Rynge, Gideon Juve, Karan Vahi, Gaurang Mehta, Ewa Deelman Information Sciences Institute,
Building Scalable Elastic Applications using Work Queue Dinesh Rajan and Douglas Thain University of Notre Dame Tutorial at CCGrid, May Delft,
Demonstration of Scalable Scientific Applications Peter Sempolinski and Dinesh Rajan University of Notre Dame.
Southern California Earthquake Center CyberShake Progress Update November 3, 2014 – 4 May 2015 UGMS May 2015 Meeting Philip Maechling SCEC IT Architect.
Building Scalable Scientific Applications with Work Queue Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts.
Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer
1 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Part IV Workflow Mapping and Execution in Pegasus (Thanks.
PEER 2003 Meeting 03/08/031 Interdisciplinary Framework Major focus areas Structural Representation Fault Systems Earthquake Source Physics Ground Motions.
Massively Parallel Molecular Dynamics Using Adaptive Weighted Ensemble Badi’ Abdul-Wahid PI: Jesús A. Izaguirre CCL Workshop 2013.
Managing LIGO Workflows on OSG with Pegasus Karan Vahi USC Information Sciences Institute
1
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
SCEC CyberShake on TG & OSG: Options and Experiments Allan Espinosa*°, Daniel S. Katz*, Michael Wilde*, Ian Foster*°,
Overview of Scientific Workflows: Why Use Them?
OPERATING SYSTEMS CS 3502 Fall 2017
Cloudy Skies: Astronomy and Utility Computing
Simplify Your Science with Workflow Tools
Seismic Hazard Analysis Using Distributed Workflows
The Accelerated Weighted Ensemble
Scott Callaghan Southern California Earthquake Center
Workflows and the Pegasus Workflow Management System
CyberShake Study 16.9 Discussion
Haiyan Meng and Douglas Thain
Panorama: Modeling the Performance of Scientific Workflows
University of Southern California
ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC
Pegasus Workflows on XSEDE
Creating Custom Work Queue Applications
Overview of Workflows: Why Use Them?
rvGAHP – Push-Based Job Submission Using Reverse SSH Connections
High Throughput Computing for Astronomers
A General Approach to Real-time Workflow Monitoring
CyberShake Study 18.8 Planning
Presentation transcript:

dV/dt : Accelerating the Rate of Progress Towards Extreme Scale Collaborative Science Miron Livny (UW), Ewa Deelman (USC/ISI), Douglas Thain (ND), Frank Wuerthwein (UCSD), Bill Allcock (ANL) … make it easier for scientists to conduct large- scale computational tasks that use the power of computing resources they do not own to process data they did not collect with applications they did not develop …

An (slightly) different definition of High Throughput Computing: Offered Load >> System Capacity The job of an HTC system is to maximize concurrency without exceeding resources. But this only works if the resources are known!

The Ideal Picture Run core tasks at a time Resource Manager: 10,000 x: 2000 cores

What actually happens: 1 TB GPU 3M files of 1K each 3M files of 1K each 128 GB Resource Manager: 10,000 x: 2000 cores 16 cores Run core tasks at a time

Some reasonable questions: Will this workload at all on machine X? How many workloads can I run simultaneously without running out of storage space? Did this workload actually behave as expected when run on a new machine? How is run X different from run Y? If my workload wasn’t able to run on this machine, where can I run it?

End users have no idea what resources their applications actually need. and… Computer systems are terrible at describing their capabilities and limits. and… Nobody likes to say NO.

Stages of Resource Management Estimate the application resource needs Find the appropriate computing resources Acquire those resources Deploy applications and data on the resources Manage applications and resources during run. Can we do it for one task? How about an app composed of many tasks?

B1 B2 B3 A1A2A3 F Regular Graphs Irregular Graphs A 1 B C D E 8910 A Dynamic Workloads while( more work to do) { foreach work unit { t = create_task(); submit_task(t); } t = wait_for_task(); process_result(t); } Static Workloads Concurrent Workloads Categories of Applications FFF F FF FF

CyberShake PSHA Workflow 239 Workflows Each site in the input map corresponds to one workflow Each workflow has:  820,000 tasks  Description  Builders ask seismologists: “What will the peak ground motion be at my new building in the next 50 years?”  Seismologists answer this question using Probabilistic Seismic Hazard Analysis (PSHA) Southern California Earthquake Center MPI codes ~ 12,000 CPU hours, Post Processing 2,000 CPU hours Data footprint ~ 800GB Pegasus managed workflows Workflow Ensembles

Bioinformatics Portal Generates Workflows for Makeflow 10 BLAST (Small) 17 sub-tasks ~4h on 17 nodes BWA 825 sub-tasks ~27m on 100 nodes SHRIMP 5080 sub-tasks ~3h on 200 nodes

Example: Adaptive Weighted Ensemble AWE Logic Work Queue Amazon EC2HPC Cluster GPU ClusterCondor Personal Cloud Badi Abdul-Wahid, Li Yu, Dinesh Rajan, Haoyun Feng, Eric Darve, Douglas Thain, Jesus A. Izaguirre, Folding Proteins at 500 ns/hour with Work Queue, IEEE e-Science Conference, : Start with Standard MD Simulation 2: Research Goal: Generate Network of States 3: Build AWE Logic Using Work Queue Library 4: Run on CPUs/GPUs at Notre Dame, Stanford, and Amazon 5: New State Transition Discovered!

Task Characterization/Execution Understand the resource needs of a task Establish expected values and limits for task resource consumption Launch tasks on the correct resources Monitor task execution and resource consumption, interrupt tasks that reach limits Possibly re-launch task on different resources

Data Collection and Modeling RAM: 50M Disk: 1G CPU: 4 C RAM: 50M Disk: 1G CPU: 4 C monitor task workflow typ max min P RAM A A B B C C D D E E F F Resource Allocations A A C C F F B B D D E E Workflow Structure Workflow Profile Task Profile Records From Many Tasks Task Record RAM: 50M Disk: 1G CPU: 4 C RAM: 50M Disk: 1G CPU: 4 C RAM: 50M Disk: 1G CPU: 4 C RAM: 50M Disk: 1G CPU: 4 C RAM: 50M Disk: 1G CPU: 4 C RAM: 50M Disk: 1G CPU: 4 C

Resource Monitor RM Summary File start: end: exit_type: normal exit_status: 0 max_concurrent_processes: 16 wall_time: cpu_time: virtual_memory: resident_memory: swap_memory: 0 bytes_read: bytes_written: Log File: #wall_clock(useconds)concurrent_processes cpu_time(useconds) virtual_memory(kB)resident_memory(kB) swap_memory(kB) bytes_readbytes_written Local Process Tree % resource_monitor mysim.exe

Monitoring Strategies SummariesSnapshotEvents getrusage and times Reading /proc and measuring disk at given intervals. Linker wrapper to libc Available only at the end of a process. Blind while waiting for next interval. Fragile to modifications of the environment, no statically linked processes. IndirectDirect Monitor how the world changes while the process tree is alive. Monitor what functions, and with which arguments the process tree is calling.

Portable Resource Management Work Queue Work Queue while( more work to do) { foreach work unit { t = create_task(); submit_task(t); } t = wait_for_task(); process_result(t); } RM Task W W W W W W task 1 details: cpu, ram, disk task 2 details: cpu, ram, disk task 3 details: cpu, ram, disk task 1 details: cpu, ram, disk task 2 details: cpu, ram, disk task 3 details: cpu, ram, disk Pegasus RM Task Job-1.res Job-2.res job-3.res Makeflow other batch system RM Task rule-1.res Jrule2.res rule-3.res

Resource Visualization of SHRiMP

Outliers Happen: BWA Example

Completing the Cycle task typ max min P RAM CPU: 10s RAM: 16GB DISK: 100GB task RM Allocate Resources Historical Repository CPU: 5s RAM: 15GB DISK: 90GB Observed Resources Measurement and Enforcement Exception Handling Is it an outlier?

dVdT Online Archive

Complete Workload Characterization X GB 32 cores 16 GB 4 cores X 1 X hour 5 Tb/s I/O 128 GB 32 cores 16 GB 4 cores X 1 X hours 500 Gb/s I/O We can approach the question: Can it run on this particular machine? What machines could it run on?

Deliverables Relevant to OSG User-level monitoring tools to report single- task resource usage. (2 prototypes) An online archive for collecting, reporting, and predicting resource usage records. (Working) Methods for run-time estimation and adaptation of resources. (In progress) Techniques integrated into workflow engines compatible with the OSG (Pegasus, Makeflow) Interoperable components that can be mixed and matched according to local needs.

Acknowledgements 23 dV/dT Project PIs Bill Allcock (ALCF) Ewa Deelman (USC) Miron Livny (UW) Douglas Thain (ND) Frank Weurthwein (UCSD) Project Staff and Students Gideon Juve (USC) Raphael Silva (USC) Sepideh Azarnoosh (USC) Ben Tovar (ND) Casey Robinson (ND)