High Throughput Computing for Astronomers

Slides:



Advertisements
Similar presentations
Pegasus on the Virtual Grid: A Case Study of Workflow Planning over Captive Resources Yang-Suk Kee, Eun-Kyu Byun, Ewa Deelman, Kran Vahi, Jin-Soo Kim Oracle.
Advertisements

A Dynamic World, what can Grids do for Multi-Core computing? Daniel Goodman, Anne Trefethen and Douglas Creager
Natasha Pavlovikj, Kevin Begcy, Sairam Behera, Malachy Campbell, Harkamal Walia, Jitender S.Deogun University of Nebraska-Lincoln Evaluating Distributed.
Overview of Wisconsin Campus Grid Dan Bradley Center for High-Throughput Computing.
The ADAMANT Project: Linking Scientific Workflows and Networks “Adaptive Data-Aware Multi-Domain Application Network Topologies” Ilia Baldine, Charles.
Introduction The Open Science Grid (OSG) is a consortium of more than 100 institutions including universities, national laboratories, and computing centers.
Ewa Deelman, Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example Nandita Mangal,
The Open Science Grid: Bringing the power of the Grid to scientific research
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.
An Astronomical Image Mosaic Service for the National Virtual Observatory / ESTO.
April 2009 OSG Grid School - RDU 1 Open Science Grid John McGee – Renaissance Computing Institute University of North Carolina, Chapel.
Intermediate HTCondor: Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
CREATING A MULTI-WAVELENGTH GALACTIC PLANE ATLAS WITH AMAZON WEB SERVICES G. Bruce Berriman, John Good IPAC, California Institute of Technolog y Ewa Deelman,
Managing Workflows with the Pegasus Workflow Management System
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Authors: Weiwei Chen, Ewa Deelman 9th International Conference on Parallel Processing and Applied Mathmatics 1.
Welcome to CW 2007!!!. The Condor Project (Established ‘85) Distributed Computing research performed by.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
Ocean Observatories Initiative Common Execution Infrastructure (CEI) Overview Michael Meisinger September 29, 2009.
Introduction to Hadoop and HDFS
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Open Science Grid For CI-Days Elizabeth City State University Jan-2008 John McGee – OSG Engagement Manager Manager, Cyberinfrastructure.
Managing large-scale workflows with Pegasus Karan Vahi ( Collaborative Computing Group USC Information Sciences Institute Funded.
Experiences Using Cloud Computing for A Scientific Workflow Application Jens Vöckler, Gideon Juve, Ewa Deelman, Mats Rynge, G. Bruce Berriman Funded by.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Welcome and Condor Project Overview.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Dr. Ahmed Abdeen Hamed, Ph.D. University of Vermont, EPSCoR Research on Adaptation to Climate Change (RACC) Burlington Vermont USA MODELING THE IMPACTS.
Pegasus: Running Large-Scale Scientific Workflows on the TeraGrid Ewa Deelman USC Information Sciences Institute
1 Condor Team 2011 Established 1985.
Intermediate Condor: Workflows Rob Quick Open Science Grid Indiana University.
1 Development of a High-Throughput Computing Cluster at Florida Tech P. FORD, R. PENA, J. HELSBY, R. HOCH, M. HOHLMANN Physics and Space Sciences Dept,
Review of Condor,SGE,LSF,PBS
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Open Science Grid as XSEDE Service Provider Open Science Grid as XSEDE Service Provider December 4, 2011 Chander Sehgal OSG User Support.
Funded by the NSF OCI program grants OCI and OCI Mats Rynge, Gideon Juve, Karan Vahi, Gaurang Mehta, Ewa Deelman Information Sciences Institute,
Bulk Data Transfer Activities We regard data transfers as “first class citizens,” just like computational jobs. We have transferred ~3 TB of DPOSS data.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
George Kola Computer Sciences Department University of Wisconsin-Madison Data Pipelines: Real Life Fully.
1 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Part IV Workflow Mapping and Execution in Pegasus (Thanks.
Managing LIGO Workflows on OSG with Pegasus Karan Vahi USC Information Sciences Institute
HTCondor-CE. 2 The Open Science Grid OSG is a consortium of software, service and resource providers and researchers, from universities, national laboratories.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,
Lizhe Wang, Gregor von Laszewski, Jai Dayal, Thomas R. Furlani
Pegasus WMS Extends DAGMan to the grid world
Volunteer Computing for Science Gateways
Tools and Services Workshop
Cloudy Skies: Astronomy and Utility Computing
Joslynn Lee – Data Science Educator
HTCondor and LSST Stephen Pietrowicz Senior Research Programmer National Center for Supercomputing Applications HTCondor Week May 2-5, 2017.
DIRAC services.
Scaling Up Scientific Workflows with Makeflow
Miron Livny John P. Morgridge Professor of Computer Science
Workflows and the Pegasus Workflow Management System
Recap: introduction to e-science
US CMS Testbed.
Haiyan Meng and Douglas Thain
Pegasus and Condor Gaurang Mehta, Ewa Deelman, Carl Kesselman, Karan Vahi Center For Grid Technologies USC/ISI.
University of Southern California
Pegasus Workflows on XSEDE
Overview of Workflows: Why Use Them?
Mats Rynge USC Information Sciences Institute
A General Approach to Real-time Workflow Monitoring
GLOW A Campus Grid within OSG
Frieda meets Pegasus-WMS
Welcome to (HT)Condor Week #19 (year 34 of our project)
Presentation transcript:

High Throughput Computing for Astronomers Mats Rynge Information Sciences Institute, University of Southern California rynge@isi.edu

Outline High Throughput Computing Condor DAGMan Pegasus Workflow Management System Periodogram Workflow / Open Science Grid Galactic Plane Workflow / XSEDE Cloud

Why High Throughput Computing? For many experimental scientists, scientific progress and quality of research are strongly linked to computing throughput. In other words, they are less concerned about instantaneous computing power. Instead, what matters to them is the amount of computing they can harness over a month or a year --- they measure computing power in units of scenarios per day, wind patterns per week, instructions sets per month, or crystal configurations per year. Slide credit: Miron Livny

High Throughput Computing is a 24-7-365 activity FLOPY  (60*60*24*7*52)*FLOPS Slide credit: Miron Livny

HTC Toolbox Condor DAGMan Pegasus Matchmaking and scheduler for HTC workloads DAGMan Directed Acyclic Graph Manager – workloads with structure Pegasus Workflow Management System

Condor Users submit their serial or parallel jobs to Condor, Condor places them into a queue, chooses when and where to run the jobs based upon a policy, carefully monitors their progress, and ultimately informs the user upon completion. Job queuing mechanism Scheduling policy Priority scheme Resource monitoring Resource management

DAGMan Directed Acyclic Graph Manager Handles tasks with dependencies

Pegasus Workflow Management System Abstract Workflows - Pegasus input workflow description Workflow “high-level language” Only identifies the computation, devoid of resource descriptions, devoid of data locations Pegasus Workflow “compiler” (plan/map) Target is DAGMan DAGs and Condor submit files Transforms the workflow for performance and reliability Automatically locates physical locations for both workflow components and data Provides runtime provenance

Original workflow: 15 compute nodes devoid of resource assignment 4 Original workflow: 15 compute nodes devoid of resource assignment 5 8 9 4 8 3 7 10 13 12 15 9 10 12 13 15 Resulting workflow mapped onto 3 Grid sites: 13 data stage-in nodes 11 compute nodes (4 reduced based on available intermediate data) 8 inter-site data transfers 14 data stage-out nodes to long-term storage 14 data registration nodes (data cataloging)

Periodogram Workflow 1.1 million total tasks 108 sub workflows Scientific goal is to generate an atlas of periodicities of the public Kepler data. The atlas will be served through the NASA Star and Exoplanet Database (NStED) , along with a catalog of the highest-probability periodicities culled from the atlas. 1.1 million total tasks 108 sub workflows Input: 323 GB Outputs: 2 TB 100,000 CPU hours Wall time based job clustering Simple binning Target: 1 hour

Periodogram Jobs Running on the Open Science Grid 5.5 CPU years in 3 days

A framework for large scale distributed resource sharing The Open Science Grid A framework for large scale distributed resource sharing addressing the technology, policy, and social requirements of sharing OSG is a consortium of software, service and resource providers and researchers, from universities, national laboratories and computing centers across the U.S., who together build and operate the OSG project. The project is funded by the NSF and DOE, and provides staff for managing various aspects of the OSG. Brings petascale computing and storage resources into a uniform grid computing environment Integrates computing and storage resources from over 100 sites in the U.S. and beyond Open Science Grid is a framework for distributed resource sharing. One way of thinking of OSG can be a grid of grids, that is, your campus and regional grids come together into one national grid. Also note that OSG is a consortium, and that OSG does not own any compute resources. Not that we are picky, but it is actually incorrect to call something a OSG resource. The correct way is to say for example, the UNM_HPC resource which is available using OSG. OSG today has over 70 resources online.

The Evolution of OSG

Using OSG Today Astrophysics Biochemistry Bioinformatics Earthquake Engineering Genetics Gravitational-wave physics Mathematics Nanotechnology Nuclear and particle physics Text mining And more…

Galactic Plane Workflow A multiwavelength infrared image atlas of the galactic plane, composed of images at 17 different wavelengths from 1 µm to 70 µm, processed so that they appear to have been measured with a single instrument observing all 17 wavelengths 360° x 40 ° coverage 18 million input files 86 TB output dataset 17 workflows, each one with 900 sub workflows Survey Wavelengths (µm) Spitzer Space Telescope GLIMPSE I, II and 3D 3.6, 4.5, 5.8. 8.0 MIPSGAL I, II 24, 70 All Sky Surveys 2MASS 1.2, 1.6, 2.2 MSX 8.8, 12.1,14.6, 21.3 WISE † 3.4, 4.6, 12, 22 † Galactic Plane data scheduled for release Spring 2012

The Extreme Science and Engineering Discovery Environment (XSEDE) 9 supercomputers, 3 visualization systems, and 9 storage systems provided by 16 partner institutions XSEDE resources are allocated through a peer- reviewed process Open to any US open science researcher (or collaborators of US researchers) regardless of funding source XSEDE resources are provided at NO COST to the end user through NSF funding (~$100M/year). Slide credit: Matthew Vaughn

Clouds Run your own custom virtual machines But what is provided? What is missing? Science Clouds FutureGrid Commercial Clouds Amazon EC2, Google Compute, RackSpace

The application of cloud computing to astronomy: A study of cost and performance Berriman et.al.

Cloud – what do I have to provide? Resource Provisioning

Cloud – Tutorial This tutorial will take you through the steps of launching the Pegasus Tutorial VM on Amazon EC2 and running a simple workflow. This tutorial is intended for new users who want so get a quick overview of Pegasus concepts and usage. A preconfigured virtual machine is provided so that minimal software installation is required. The tutorial covers the process of starting the VM and of creating, planning, submitting, monitoring, debugging, and generating statistics for a simple workflow. Oregon datacenter Image: ami-8643ccb6 http://pegasus.isi.edu/wms/docs/tutorial/

Pegasus: http://pegasus.isi.edu pegasus-support@isi.edu Thank you! Pegasus: http://pegasus.isi.edu pegasus-support@isi.edu