Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.

Similar presentations


Presentation on theme: "Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG."— Presentation transcript:

1 Distributed Analysis Tutorial Dietrich Liko

2 Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG Other tools  How to access to the grid ? Certificate VOMS  How to find your data ? Where is the data stored Which data is really available ?

3 Three grids ….  Grids have different middleware Different software to submit jobs Different catalogs to store the data  We have to aim to hide this differences from the ATLAS user

4 EGEE  Job submission via LCG Resource Broker The new gLite RB is on its way …  LFC File catalog  Also CondorG submission is possible Requires some expertise and has no support from the service provider

5 Resource Broker Model RB CE

6 OSG/Panda  PANDA is an integrated production and distributed analysis system Pilot job based and similar to DIRAC & Alien  Simple File Catalogs at sites  Again CondorG submission possible

7 Panda Model Task queue CE

8 Nordugrid  ARC middleware for job submission Powerful and simple  RLS Filecatalog  At this time mainly production and not yet a place for general ATLAS Distributed Analysis

9 ARC Model CE

10 How can we live with that ?  Data management layer to hide this differences – Don Quixote 2  Tools that aim to hide the difficulties to submit jobs pathena/PANDA on OSG GANGA on LCG  In the future better interoperability On level of the ATLAS tools On the level of the middleware

11 Distributed Analysis  Data Analysis AOD & ESD analysis TAG based analysis pathena/PANDA GANGA/LCG  User Production Prodsys LJSF GANGA (DQ2 Integration)

12 pathena/PANDA  Lightweight client  Integrated to Athena release Very nice work  A lot of work has been done to support better user jobs Short queues, multitasking pilots etc.  A large set of data is available  Available since some time  Tadashi will tell you more about it

13 GANGA/LCG  Text UI & GUI A pathena-like interface is available  Multiple backends LCG/EGEE LSF – works also with CAT queues PBS And others

14 Progress on LCG  Many datasets available at CERN and LYON  Job priorities and short queues are being implemented Short queue: CERN, LYON, NIKHEF, FZK, RAL and some Tier-2 Priorities: NIKHEF, CERN, IFIC (PPS)  As of today one can perform distributed analysis at CERN and in LYON  We hope that within this year all the other Tier-1 centers and some Tier-2’s will follow See later this week in the Tier1/Tier-2 coordination

15 GANGA Status  Significant developments over summer Data available at CERN and LYON, GANGA would work on most sites Short queues/priorities Full DQ2 integration Transparent access to local resources (e.g. CAT queues)  Still in the pipeline Move data and priorities to all Tier-1’s Get the gLite Resource Broker into production Start iterations with users

16 Tools for simulation  GANGA (see later today)  LJSF  Prodsys Executor  Condor based submission systems

17 Dashboard Monitoring  We are setting up a framework to monitor distributed analysis jobs MonaLisa based (OSG, LCG) RGMA Imperial collage DB Production system  We plan to instrument submission system to be able to understand their usage

18 Since September 1 st …

19 Login to the grid  grid-proxy-init Basic access as of today  voms-proxy-init –voms atlas Can give access to special rights Today: Job Priorities on LCG to separate Production from Analysis

20 How to find out which data exists  AMI Metadata http://ami3.in2p3.fr:8080/AMI/  Prodsys database http://cern.ch/atlas-php/DbAdmin/Ora/php- 4.3.4/proddb/monitor/Datasets.php http://cern.ch/atlas-php/DbAdmin/Ora/php- 4.3.4/proddb/monitor/Datasets.php  Dataset browser http://gridui02.usatlas.bnl.gov:25880/server/pa ndamon/query?overview=dslist http://gridui02.usatlas.bnl.gov:25880/server/pa ndamon/query?overview=dslist

21 How to access data ?  Download with dq2_get, analyze locally Works now, is not scalable  Data is distributed on sites, jobs are send to sites to analyze the data DA wants to promote this way of working

22 Dataset distribution  In principle data should be everywhere AOD & ESD during this year ~ 30 TB max  Three steps Not all data can be consolidated Other grids, Tier-2 Distribution between Tier-1 not yet perfect Distribution to Tier-2’s can only be the next step

23 CSC11 AOD Data at Tier-1 DatasetsCompleteFilesSize ASGC96145634520 BNL226131170531736 CERN253106166101712 CNAF1021739 FZK1631510172 LYON72136518786 RAL10158993 SARA2025138 PIC73916105 TRIUMF7351077

24 CSC11 ESD Data at Tier-1 DatasetsCompleteFilesSize ASGC62633442721 BNL14196114949507 CERN99743724569 CNAF21163213 FZK0000 LYON1011 RAL80403428 SARA1011 PIC10147193 TRIUMF5255518

25 Monitoring of transfers

26 Dataset conclusion  AOD Analysis at BNL, CERN, LYON  ESD Analysis only at BNL  We have still to work hard to complete the “collection” of data  We have to push hard to achieve equal distribution between sites  Nevertheless: Its big progress to some month ago!

27 Dataset details  BNL http://www.usatlas.bnl.gov/~dial/atprod/v alidation/html/bnl_datasets.html http://www.usatlas.bnl.gov/~dial/atprod/v alidation/html/bnl_datasets.html  CERN http://lapp.in2p3.fr/atlas/Informatique/Offli ne/CERNCAF_csc11/AOD/list_CC.html http://lapp.in2p3.fr/atlas/Informatique/Offli ne/CERNCAF_csc11/AOD/list_CC.html  LYON http://lapp.in2p3.fr/atlas/Informatique/Offli ne/LYONDISK_csc11/AOD/list_CC.html http://lapp.in2p3.fr/atlas/Informatique/Offli ne/LYONDISK_csc11/AOD/list_CC.html

28 DQ2 end user tools  dq2_ls List dataset and files  dq2_get Download a dataset  dq2_put Create a dataset  dq2_poolFCjob0 Create a PoolFileCatalog to locally access data  Details: https://uimon.cern.ch/twiki/bin/view/Atlas/UsingDQ2 https://uimon.cern.ch/twiki/bin/view/Atlas/UsingDQ2

29 Lets try out dq2 end user tools  Login on lxplus  source /afs/cern.ch/project/gd/LCG- share/sl3/etc/profile.d/grid_env.sh  alias dq2 = /afs/cern.ch/atlas/offline/external/GRID/ddm/pro02/dq2  source /afs/usatlas.bnl.gov/Grid/Don- Quijote/dq2_user_client/setup.sh.CERN

30 Summary  Several tools are available to perform Distributed Analysis Integrated with DQ2  Data is being collected and also distributed Still a lot of work in front of us  We learn how to handle user jobs Job Priorities on LCG Multitasking pilots in PANDA

31 Next steps  Increase the number of sites We have to push getting the data at all Tier-1. They are the backbone of the ATLAS data distribution  Interoperability Will for sure be an issue for the next software week


Download ppt "Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG."

Similar presentations


Ads by Google