PD2P The DA Perspective Kaushik De Univ. of Texas at Arlington S&C Week, CERN Nov 30, 2010.

Slides:

Advertisements

Similar presentations

Patient Charges in Meditech 6.1

Advertisements

SLUO LHC Workshop, SLACJuly 16-17, Analysis Model, Resources, and Commissioning J. Cochran, ISU Caveat: for the purpose of estimating the needed.

Nurcan Ozturk University of Texas at Arlington for the Distributed Analysis Support Team (DAST) ATLAS Distributed Computing Technical Interchange Meeting.

December Pre-GDB meeting1 CCRC08-1 ATLAS’ plans and intentions Kors Bos NIKHEF, Amsterdam.

Tier-2 Network Requirements Kors Bos LHC OPN Meeting CERN, October 7-8,

AMOD Report Doug Benjamin Duke University. Hourly Jobs Running during last week 140 K Blue – MC simulation Yellow Data processing Red – user Analysis.

Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.

December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.

Experience with the WLCG Computing Grid 10 June 2010 Ian Fisk.

David Hutchcroft on behalf of John Bland Rob Fay Steve Jones And Mike Houlden [ret.] * /.\ /..‘\ /'.‘\ /.''.'\ /.'.'.\ /'.''.'.\ ^^^[_]^^^ * /.\ /..‘\

CHEP – Mumbai, February 2006 The LCG Service Challenges Focus on SC3 Re-run; Outlook for 2006 Jamie Shiers, LCG Service Manager.

AMOD Report Doug Benjamin Duke University. Running Jobs last 7 days 120K MC sim Users MC Rec Group.

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.

Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.

PanDA A New Paradigm for Computing in HEP Kaushik De Univ. of Texas at Arlington NRC KI, Moscow January 29, 2015.

DDM-Panda Issues Kaushik De University of Texas At Arlington DDM Workshop, BNL September 29, 2006.

PanDA Summary Kaushik De Univ. of Texas at Arlington ADC Retreat, Naples Feb 4, 2011.

Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.

Analysis infrastructure/framework A collection of questions, observations, suggestions concerning analysis infrastructure and framework Compiled by Marco.

US ATLAS Computing Operations Kaushik De University of Texas At Arlington US ATLAS Distributed Facility Workshop at SLAC October 13, 2010.

Experiment Support ANALYSIS FUNCTIONAL AND STRESS TESTING Dan van der Ster, CERN IT-ES-DAS for the HC team: Johannes Elmsheuser, Federica Legger, Mario.

PanDA Update Kaushik De Univ. of Texas at Arlington XRootD Workshop, UCSD January 27, 2015.

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Changes in PD2P replication strategy S. Campana (CERN IT/ES) on.

DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.

Optimizing CMS Data Formats for Analysis Peerut Boonchokchuay August 11 th,

DELETION SERVICE ISSUES ADC Development meeting

Karsten Köneke October 22 nd 2007 Ganga User Experience 1/9 Outline: Introduction What are we trying to do? Problems What are the problems? Conclusions.

Network awareness and network as a resource (and its integration with WMS) Artem Petrosyan (University of Texas at Arlington) BigPanDA Workshop, CERN,

Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,

The ATLAS Cloud Model Simone Campana. LCG sites and ATLAS sites LCG counts almost 200 sites. –Almost all of them support the ATLAS VO. –The ATLAS production.

PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.

Data Management: US Focus Kaushik De, Armen Vartapetian Univ. of Texas at Arlington US ATLAS Facility, SLAC Apr 7, 2014.

PanDA & BigPanDA Kaushik De Univ. of Texas at Arlington BigPanDA Workshop, CERN October 21, 2013.

LHCbComputing LHCC status report. Operations June 2014 to September m Running jobs by activity o Montecarlo simulation continues as main activity.

SDN Provisioning, next steps after ANSE Kaushik De Univ. of Texas at Arlington US ATLAS Planning, CERN June 29, 2015.

U.S. ATLAS Computing Facilities Overview Bruce G. Gibbard Brookhaven National Laboratory U.S. LHC Software and Computing Review Brookhaven National Laboratory.

The ATLAS Computing Model and USATLAS Tier-2/Tier-3 Meeting Shawn McKee University of Michigan Joint Techs, FNAL July 16 th, 2007.

ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.

14/03/2007A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 14/03/07.

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

Shifters Jamboree Kaushik De ADC Jamboree, CERN December 4, 2014.

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.

Dynamic Data Placement: the ATLAS model Simone Campana (IT-SDC)

Main parameters of Russian Tier2 for ATLAS (RuTier-2 model) Russia-CERN JWGC meeting A.Minaenko IHEP (Protvino)

PanDA & Networking Kaushik De Univ. of Texas at Arlington ANSE Workshop, CalTech May 6, 2013.

Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.

PD2P Planning Kaushik De Univ. of Texas at Arlington S&C Week, CERN Dec 2, 2010.

PD2P, Caching etc. Kaushik De Univ. of Texas at Arlington ADC Retreat, Naples Feb 4, 2011.

LHCONE Workshop Richard P Mount February 10, 2014 Concerns from Experiments ATLAS Richard P Mount SLAC National Accelerator Laboratory.

PanDA & Networking Kaushik De Univ. of Texas at Arlington UM July 31, 2013.

CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.

Daniele Bonacorsi Andrea Sciabà

Supporting Analysis Users in U.S. ATLAS

Computing Operations Roadmap

T1/T2 workshop – 7th edition - Strasbourg 3 May 2017 Latchezar Betev

U.S. ATLAS Tier 2 Computing Center

Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN

ATLAS activities in the IT cloud in April 2008

The Data Lifetime model

LHCb Software & Computing Status

LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.

PanDA in a Federated Environment

Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group

Readiness of ATLAS Computing - A personal view

The ADC Operations Story

Univ. of Texas at Arlington BigPanDA Workshop, ORNL

Roadmap for Data Management and Caching

The ATLAS Computing Model

Development of LHCb Computing Model F Harris

Presentation transcript:

PD2P The DA Perspective Kaushik De Univ. of Texas at Arlington S&C Week, CERN Nov 30, 2010

Introduction  PD2P was introduced ~5 months ago  PD2P adiabatically and dramatically changed data distribution model for analysis users  Designed to have no impact on user experience  Changes were rolled in gradually behind the scenes  So far, no complaints (or reactions from users) – maybe because they do not know about the changes we made!  In this talk, I will present  Brief overview (for more details see my talk yesterday: resId=1&materialId=slides&confId=76896) resId=1&materialId=slides&confId=76896  Analysis of usage patterns Nov 30, 2010 Kaushik De 2

Huge Rise in Analysis Activity Nov 30, 2010 Kaushik De 3 PanDA Backend Only LHC Start

Data Distribution is Very Important Nov 30, 2010 Kaushik De 4  Most user analysis jobs run at Tier 2 sites  Jobs are sent to data  We rely on pushing data out to Tier 2 sites promptly  Difficult since many data formats and many sites  We adjusted frequently the number of copies and data types in April & May  But Tier 2 sites were filling up too rapidly, and user pattern was unpredictable  Most datasets copied to Tier 2’s were never used

We Changed Data Distribution Model  Reduce pushed data copies to Tier 2’s  Only send small fraction of AOD’s automatically  Pull all other data types, when needed by users  Note: for production we have always pulled data as needed  But users were insulated from this change  Did not affect the many critical ongoing analyses  No delays in running jobs  No change in user workflow Nov 30, 2010 Kaushik De 5

Data Flow to Tier 2’s  Example above is from US Tier 2 sites  Exponential rise in April and May, after LHC start  We changed data distribution model end of June – PD2P  Much slower rise since July, even as luminosity grows rapidly Nov 30, 2010 Kaushik De 6

PD2P Subscriptions Nov 29, 2010 Kaushik De 7

Reuse of Files Nov 29, 2010 Kaushik De 8

Patterns of Data Usage – Part I  Interesting patterns are emerging by type of data  LHC data reused more often than MC data – not unexpected Nov 30, 2010 Kaushik De 9

Patterns of Data Usage – Part 2  Interesting patterns also by format of data  During past ~5 months:  All types of data showing up: ESD, NTUP, AOD, DED most popular  But highest reuse (counting files): ESD, NTUP Nov 30, 2010 Kaushik De 10

Trends in Data Reuse Nov 30, 2010 Kaushik De 11  PD2P pull model does not need a priori assumption about which data types are needed for user analysis  It automatically moves data based on user workflow  We observe now a shift back to ESD!

Plans for the Future  Continue tuning PD2P, as needed  Develop PD2P for Tier 1’s  Currently, all data is pushed to Tier 1’s through central subscriptions  Storage is filling up rapidly – disk crises last month  We need to reduce amount of data automatically subscribed  Start pulling data as needed for user analysis within cloud  We are working out the details of the algorithm  More news soon  Please provide user feedback if something can be improved Nov 30, 2010 Kaushik De 12

Rise in Tier 1 Storage Use Nov 29, 2010 Kaushik De 13

How to Reduce Data Stored at Tier 1’s Nov 30, 2010 Kaushik De 14  Largest amount of Tier 1 data is primary  Must reduce automatic subscriptions  Second largest Tier 1 data is secondary  We suspect much of this data is never used  Go to pull model for secondary data (and also part of data which is currently labeled primary) using PD2P