OSG Area Coordinator’s Report: Workload Management October 6 th, 2010 Maxim Potekhin BNL 631-344-3621

Slides:

Advertisements

Similar presentations

PanDA Integration with the SLAC Pipeline Torre Wenaus, BNL BigPanDA Workshop October 21, 2013.

Advertisements

Jiri Chudoba for the Pierre Auger Collaboration Institute of Physics of the CAS and CESNET.

CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle and Streams Diagnostics and Monitoring Eva Dafonte Pérez Florbela Tique Aires.

1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES News on monitoring for CMS distributed computing operations Andrea.

The PanDA Distributed Production and Analysis System Torre Wenaus Brookhaven National Laboratory, USA ISGC 2008 Taipei, Taiwan April 9, 2008 Torre Wenaus.

LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.

5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)

BNL Computing Environment EIC Detector R&D Simulation Workshop October 8-9 th, 2012 Maxim Potekhin Yuri Fisyak

Robert Fourer, Jun Ma, Kipp Martin Copyright 2006 An Enterprise Computational System Built on the Optimization Services (OS) Framework and Standards Jun.

CERN IT Department CH-1211 Geneva 23 Switzerland t The Experiment Dashboard ISGC th April 2008 Pablo Saiz, Julia Andreeva, Benjamin.

Integration and Sites Rob Gardner Area Coordinators Meeting 12/4/08.

CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting.

OSG Area Coordinator’s Report: Workload Management February 9 th, 2011 Maxim Potekhin BNL

Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.

EGEE-III INFSO-RI Enabling Grids for E-sciencE Julia Andreeva CERN (IT/GS) CHEP 2009, March 2009, Prague New job monitoring strategy.

PanDA Multi-User Pilot Jobs Maxim Potekhin Brookhaven National Laboratory Open Science Grid WLCG GDB Meeting CERN March 11, 2009.

1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.

Automated Grid Monitoring for LHCb Experiment through HammerCloud Bradley Dice Valentina Mancinelli.

OSG Area Coordinator’s Report: Workload Management April 20 th, 2011 Maxim Potekhin BNL

David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.

PanDA Monitor Development ATLAS S&C Workshop by V.Fine (BNL)

LBNE/Daya Bay utilization of Panda: project review and status report PAS Group Meeting November 12, 2010 Maxim Potekhin for BNL Physics Applications Software.

OSG Production Report OSG Area Coordinator’s Meeting Aug 12, 2010 Dan Fraser.

OSG Area Coordinator’s Report: Workload Management Maxim Potekhin BNL

Tarball server (for Condor installation) Site Headnode Worker Nodes Schedd glidein - special purpose Condor pool master DB Panda Server Pilot Factory -

Experiment Support ANALYSIS FUNCTIONAL AND STRESS TESTING Dan van der Ster, CERN IT-ES-DAS for the HC team: Johannes Elmsheuser, Federica Legger, Mario.

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Direct gLExec integration with PanDA Fernando H. Barreiro Megino CERN IT-ES-VOS.

DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.

Storage cleaner: deletes files on mass storage systems. It depends on the results of deletion, files can be set in states: deleted or to repeat deletion.

Evolution of a High Performance Computing and Monitoring system onto the GRID for High Energy Experiments T.L. Hsieh, S. Hou, P.K. Teng Academia Sinica,

OSG Production Report OSG Area Coordinator’s Meeting Nov 17, 2010 Dan Fraser.

6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.

MultiJob pilot on Titan. ATLAS workloads on Titan Danila Oleynik (UTA), Sergey Panitkin (BNL) US ATLAS HPC. Technical meeting 18 September 2015.

EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.

Production Coordination Area VO Meeting Feb 11, 2009 Dan Fraser – Production Coordinator.

OSG Area Coordinator’s Report: Workload Management May14 th, 2009 Maxim Potekhin BNL

ATLAS Dashboard Recent Developments Ricardo Rocha.

INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.

HammerCloud Functional tests Valentina Mancinelli IT/SDC 28/2/2014.

Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.

Julia Andreeva on behalf of the MND section MND review.

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Monitoring of the LHC Computing Activities Key Results from the Services.

OSG Area Coordinator’s Report: Workload Management Maxim Potekhin BNL May 8 th, 2008.

MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.

Global ADC Job Monitoring Laura Sargsyan (YerPhI).

Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.

OSG Area Coordinator’s Report: Workload Management March 25 th, 2010 Maxim Potekhin BNL

Proxy management mechanism and gLExec integration with the PanDA pilot Status and perspectives.

OSG Area Coordinator’s Report: Workload Management August 20 th, 2009 Maxim Potekhin BNL

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

Site Services and Policies Summary Dirk Düllmann, CERN IT More details at

Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.

Future of Distributed Production in US Facilities Kaushik De Univ. of Texas at Arlington US ATLAS Distributed Facility Workshop, Santa Cruz November 13,

Mobile Analyzer A Distributed Computing Platform Juho Karppinen Helsinki Institute of Physics Technology Program May 23th, 2002 Mobile.

OSG Area Coordinator’s Report: Workload Management February 9 th, 2011 Maxim Potekhin BNL

MND section. Summary of activities Job monitoring In collaboration with GridView and LB teams enabled full chain from LB harvester via MSG to Dashboard.

Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.

Alessandro De Salvo Mayuko Kataoka, Arturo Sanchez Pineda,Yuri Smirnov CHEP 2015 The ATLAS Software Installation System v2 Alessandro De Salvo Mayuko Kataoka,

OSG Area Coordinator’s Report: Workload Management June 3 rd, 2010 Maxim Potekhin BNL

A Scalable and Resilient PanDA Service for Open Science Grid Dantong Yu Grid Group RHIC and US ATLAS Computing Facility.

Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.

Efi.uchicago.edu ci.uchicago.edu FAX splinter session Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Tier 1 / Tier 2 /

Panda Monitoring, Job Information, Performance Collection Kaushik De (UT Arlington), Torre Wenaus (BNL) OSG All Hands Consortium Meeting March 3, 2008.

Virtualization and Clouds ATLAS position

POW MND section.

Savannah to Jira Migration

An example of a pilot project as part of USP course:

D. van der Ster, CERN IT-ES J. Elmsheuser, LMU Munich

Ivan Reid (Brunel University London/CMS)

Presentation transcript:

OSG Area Coordinator’s Report: Workload Management October 6 th, 2010 Maxim Potekhin BNL

2 Workload Management: Overview Areas of Activity: ITB, Engagement, Panda Monitoring Upgrade Status of current initiatives  Panda/ITB:  Continued use of “ITB robot” to run test suite via Panda, Web service implemented for data archiving (Suchandra)  Integration with CERN Analysis Dashboard: Panda information is now collected and fed to the Dashboard  Engagement:  DUSEL Long Base Neutrino Experiment (LBNE), and Daya Bay: o First round of testing completed with real jobs o Currently, preparations underway for a production run using Panda (solving data validation, error reporting and other issues) o Will expand to NERSC  CHARMM: expansion of job submission to more sites o Multiple sites have been set up (Jose, Tim, Alden) o 15 sites are working for CHARMM (automatic load distribution due to pilot framework) o 43k CPU-hours utilized monthly o Fresh runs are planned for near future with more models to compute  Upgrade of Panda monitoring system (Maxim)  Migration of Panda monitoring system to the new platform  New sections added to the Django/AJAX/jQuery prototype  Code re-factored for cooperative development with CERN team (1.5 ATLAS developers joining the project)  Server-side caching implemented with a few types of back-end (e.g. memcached server)  Oracle-specific optimization integrated into Django server code  Beta-testing to commence late in the year  Presented at Atlas Monitoring Meetings

3 Panda Monitor

4 Continued Development of Panda Monitoring Client

5

6 Overview Workload Management Issues / Concerns  Panda monitoring upgrade is a labor-intensive project due to a large amount of legacy code  Looking forward to conducting LBNE simulation run at BNL and expanding LBNE runs to more sites