VO Experiences with Open Science Grid Storage OSG Storage Forum | Wednesday September 22, 2010 (10:30am)

Slides:



Advertisements
Similar presentations
Introduction The Open Science Grid (OSG) is a consortium of more than 100 institutions including universities, national laboratories, and computing centers.
Advertisements

Intermediate Condor: DAGMan Monday, 1:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Sun Grid Engine Grid Computing Assignment – Fall 2005 James Ruff Senior Department of Mathematics and Computer Science Western Carolina University.
Intermediate HTCondor: Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
A tool to enable CMS Distributed Analysis
Minerva Infrastructure Meeting – October 04, 2011.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
OSG End User Tools Overview OSG Grid school – March 19, 2009 Marco Mambelli - University of Chicago A brief summary about the system.
OSG Public Storage and iRODS
10/20/05 LIGO Scientific Collaboration 1 LIGO Data Grid: Making it Go Scott Koranda University of Wisconsin-Milwaukee.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
OSG Site Provide one or more of the following capabilities: – access to local computational resources using a batch queue – interactive access to local.
OSG Area Coordinators’ Meeting LIGO Applications (NEW) Kent Blackburn Caltech / LIGO October 29 th,
Managing large-scale workflows with Pegasus Karan Vahi ( Collaborative Computing Group USC Information Sciences Institute Funded.
CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.
Cosener’s House – 30 th Jan’031 LHCb Progress & Plans Nick Brook University of Bristol News & User Plans Technical Progress Review of deliverables.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
LIGO Applications Kent Blackburn (Robert Engel, Britta Daudert)
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
Sep 21, 20101/14 LSST Simulations on OSG Sep 21, 2010 Gabriele Garzoglio for the OSG Task Force on LSST Computing Division, Fermilab Overview OSG Engagement.
Use of Condor on the Open Science Grid Chris Green, OSG User Group / FNAL Condor Week, April
Grid Workload Management Massimo Sgaravatto INFN Padova.
OSG site utilization by VOs Ilya Narsky, Caltech.
MAGDA Roger Jones UCL 16 th December RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
The ILC And the Grid Andreas Gellrich DESY LCWS2007 DESY, Hamburg, Germany
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
UMD TIER-3 EXPERIENCES Malina Kirn October 23, 2008 UMD T3 experiences 1.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
Intermediate Condor: Workflows Rob Quick Open Science Grid Indiana University.
Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Intermediate Condor: Workflows Monday, 1:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
LIGO Plans for OSG J. Kent Blackburn LIGO Laboratory California Institute of Technology Open Science Grid Technical Meeting UCSD December 15-17, 2004.
EGEE is a project funded by the European Union under contract IST WS-Based Advance Reservation and Co-allocation Architecture Proposal T.Ferrari,
State of LSC Data Analysis and Software LSC Meeting LIGO Hanford Observatory November 11 th, 2003 Kent Blackburn, Stuart Anderson, Albert Lazzarini LIGO.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
CMS Usage of the Open Science Grid and the US Tier-2 Centers Ajit Mohapatra, University of Wisconsin, Madison (On Behalf of CMS Offline and Computing Projects)
Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.
Peter Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Introduction &
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
July 26, 2007Parag Mhashilkar, Fermilab1 DZero On OSG: Site And Application Validation Parag Mhashilkar, Fermi National Accelerator Laboratory.
Summary of OSG Activities by LIGO and LSC LIGO NSF Review November 9-11, 2005 Kent Blackburn LIGO Laboratory California Institute of Technology LIGO DCC:
Managing LIGO Workflows on OSG with Pegasus Karan Vahi USC Information Sciences Institute
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
Parag Mhashilkar (Fermi National Accelerator Laboratory)
ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,
James Casey, CERN IT-GD WLCG Workshop 1st September, 2007
Condor DAGMan: Managing Job Dependencies with Condor
Real Time Fake Analysis at PIC
Pegasus WMS Extends DAGMan to the grid world
Replica/File Catalog based on RAM SUNY
Artem Trunov and EKP team EPK – Uni Karlsruhe
Simulation use cases for T2 in ALICE
OGSA Data Architecture Scenarios
Leigh Grundhoefer Indiana University
Overview of Workflows: Why Use Them?
Mats Rynge USC Information Sciences Institute
Frieda meets Pegasus-WMS
CyberShake Study 18.8 Technical Readiness Review
Presentation transcript:

VO Experiences with Open Science Grid Storage OSG Storage Forum | Wednesday September 22, 2010 (10:30am)

 LIGO instruments generate more than a TB of data each day  Roughly 15% to 20% is analyzed in near real time  This data is “replicated” in near real time to all LIGO Data Grid sites  Data management is in the domain of the LIGO Data Grid, not the application!  Binary Inspiral Workflows contain many 10s of thousand of jobs just to analyze a few weeks of data  Workflows (and architecture of LIGO Data Grid) designed around analysis of weeks of data at a time  Remaining data relevant to time intervals of scientific significance  LIGO will finish the second science run this October and will begin an extensive upgrade to the Advance LIGO instrumentation  ~10x more sensitive  ~ 1000x larger window on the universe and corresponding event rates  Workflows design may be significantly larger in the future or it may be drastically redesigned to support LIGO data in an open data era

 Data were transferred during execution of the binary inspiral workflow  Pegasus optimized this for storage minimizations, not computation  One dag node was dedicated to transfer the entire data set of a work-flow into $OSG_DATA  The work-flow is executed in $OSG_DATA since most dag jobs need direct access to the data  Space in $OSG_DATA very limited  No OSG site prior to useful SE appearing on OSG were able to cope with scientific scale workflows of the binary inspiral  Early SE to appear on the OSG did not allow LIGO VO to run there OR required changes to the scientific codes and execution to run … unpleasant to think about for the science review committees!!

 LIGO data set is pre-staged into an OSG storage element  Britta takes care of this with custom scripts that make use of Britta being a member of the LIGO Scientific Collaboration … access restrictions  During execution of the work-flow, the data is sym-linked to the execution directory  Pegasus workflow planner takes care of doing this  If the mapping between file spaces is known  SE file system needs to be mounted on all worker nodes  In the future we hope Pegasus functionality will expand to support “second level staging” from the SE to the worker nodes … only data needed by jobs running on the work node are “staged” from the SE to the worker node

 Transferring LIGO data:  What data sets need to be transferred is highly dependent on the specifics of a particular workflow … as storage has to be “leased” with fixed size and life expectance there is currently the requirement to be precise in specifying the data transfers based on the specific workflow  File transfer and md5sum checks submitted from an OSG submit host. File transfers are executed locally on the OSG submit host via third party transfers. Md5sum checks are executed remotely on the OSG CE that is associated with the SE (competition for slots means this can take a very long time)  RLS catalog at Caltech populated with information about files in the OSG SE … this is needed by Pegasus workflow planner  At present both globus-url-copy and srm-copy are being used, with performance issues in transfers noticed  Britta has posted details of what is involved in the transfer at: 

 To date, only a handful of OSG Storage Elements have been able to meet the LIGO requirements for this application:  LIGO’s ITB, Nebraska’s Firefly, Caltech’s CMS Tier 2 have demonstrated working prototypes of analysis  UCSD’s Tier 2, TTU_ANTAEUS and UMissHEP are in progress  Not all of these are known to permit science scale workflows, but are providing useful experiences with using SE technology with this application  Britta has posted statistics of testing at:  TRANSFERS/STATS/ TRANSFERS/STATS/

 We have successfully tested several workflows utilizing the Glidein tool Corral developed along side Pegasus  The Glidein tool has also been tested on file transfer dags  Use of Glideins may improve performance of LIGO Inspiral work-flow on the OSG since a number of Inspiral jobs are of very short duration (at least while prototyping)  Glideins may also improve performance of transfer dags since md5sum jobs may sit in remote queues for long times at busy OSG sites  Britta has metrics for LIGO experiences with glideins at:  TS TS

 LIGO’s binary inspiral application needs large amounts of storage closely coupled to the worker nodes in its present form  Few OSG sites with storage elements have been available to bootstrap this to a viable production scalable solution that meets LIGO’s science publications environment  A special task force has been pulled together to understand the challenges and develop the right tools and architecture to bring this to the next level.  …THE END