Alien and GSI Marian Ivanov. Outlook GSI experience Alien experience Proposals for further improvement.

Slides:



Advertisements
Similar presentations
Status of BESIII Distributed Computing BESIII Workshop, Mar 2015 Xianghu Zhao On Behalf of the BESIII Distributed Computing Group.
Advertisements

1 OBJECTIVES To generate a web-based system enables to assemble model configurations. to submit these configurations on different.
Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.
Grid simulation (AliEn) Eugen Mudnić Technical university Split -FESB.
GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage.
ALICE analysis at GSI (and FZK) Kilian Schwarz CHEP 07.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
New CERN CAF facility: parameters, usage statistics, user support Marco MEONI Jan Fiete GROSSE-OETRINGHAUS CERN - Offline Week –
1 Status of the ALICE CERN Analysis Facility Marco MEONI – CERN/ALICE Jan Fiete GROSSE-OETRINGHAUS - CERN /ALICE CHEP Prague.
Enabling Grids for E-sciencE Medical image processing web portal : Requirements analysis. An almost end user point of view … H. Benoit-Cattin,
Distributed Computing for CEPC YAN Tian On Behalf of Distributed Computing Group, CC, IHEP for 4 th CEPC Collaboration Meeting, Sep ,
AliEn uses bbFTP for the file transfers. Every FTD runs a server, and all the others FTD can connect and authenticate to it using certificates. bbFTP implements.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
LOGO Scheduling system for distributed MPD data processing Gertsenberger K. V. Joint Institute for Nuclear Research, Dubna.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
RISICO on the GRID architecture First implementation Mirko D'Andrea, Stefano Dal Pra.
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES P. Saiz (IT-ES) AliEn job agents.
Experience with analysis of TPC data Marian Ivanov.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
11/30/2007 Overview of operations at CC-IN2P3 Exploitation team Reported by Philippe Olivero.
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
LOGO PROOF system for parallel MPD event processing Gertsenberger K. V. Joint Institute for Nuclear Research, Dubna.
1 Large-Scale Profile-HMM on the Grid Laurent Falquet Swiss Institute of Bioinformatics CH-1015 Lausanne, Switzerland Borrowed from Heinz Stockinger June.
E-science grid facility for Europe and Latin America E2GRIS1 Gustavo Miranda Teixeira Ricardo Silva Campos Laboratório de Fisiologia Computacional.
Testing the dynamic per-query scheduling (with a FIFO queue) Jan Iwaszkiewicz.
Distributed Computing for CEPC YAN Tian On Behalf of Distributed Computing Group, CC, IHEP for 4 th CEPC Collaboration Meeting, Sep , 2014 Draft.
Algorithmic Finance and Tools for Grid Execution (the Swift Grid Scripting/Workflow tool) Tiberiu (Tibi) Stef-Praun.
Andrei Gheata, Mihaela Gheata, Andreas Morsch ALICE offline week, 5-9 July 2010.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid2Win : gLite for Microsoft Windows Roberto.
Computing for Alice at GSI (Proposal) (Marian Ivanov)
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Analysis experience at GSIAF Marian Ivanov. HEP data analysis ● Typical HEP data analysis (physic analysis, calibration, alignment) and any statistical.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
Joint Institute for Nuclear Research Synthesis of the simulation and monitoring processes for the data storage and big data processing development in physical.
Software framework and batch computing Jochen Markert.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
Enabling Grids for E-sciencE LRMN ThIS on the Grid Sorina CAMARASU.
Status of BESIII Distributed Computing BESIII Collaboration Meeting, Nov 2014 Xiaomei Zhang On Behalf of the BESIII Distributed Computing Group.
BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.
ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.
Vendredi 27 avril 2007 Management of ATLAS CC-IN2P3 Specificities, issues and advice.
Experience of PROOF cluster Installation and operation
Real Time Fake Analysis at PIC
Xiaomei Zhang CMS IHEP Group Meeting December
U.S. ATLAS Grid Production Experience
SuperB and its computing requirements
Work report Xianghu Zhao Nov 11, 2014.
INFN-GRID Workshop Bari, October, 26, 2004
GSIAF & Anar Manafov, Victor Penso, Carsten Preuss, and Kilian Schwarz, GSI Darmstadt, ALICE Offline week, v. 0.8.
ALICE Physics Data Challenge 3
ALICE Computing : 2012 operation & future plans
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
MC data production, reconstruction and analysis - lessons from PDC’04
Storage elements discovery
Simulation use cases for T2 in ALICE
Survey on User’s Computing Experience
Support for ”interactive batch”
How To Integrate an Application on Grid
The LHCb Computing Data Challenge DC06
Presentation transcript:

Alien and GSI Marian Ivanov

Outlook GSI experience Alien experience Proposals for further improvement

GSI - SE 1 Goal - Central data transfer – mirroring Working ~ 20 Tby of data at GSI SE Missing part ? Central strategy of data cleaning Current status (SE is almost full, 20 out of 27 Tby used) Central data management (on level of one SE) How to check data volume – The data registered using cryptic physical names Creating file list (per SE) – eg find -SE GSI::SE yyy xxx Zombie files? Quotas per user?

GSI SE Extensive test of IO configuration performed at GSI (Mischa Zynovyev, see next presentation) Test Example: Difference in speed between parallel jobs (train of tasks) processing local data, parallel jobs processing data from remote fileserver, and parallel jobs processing data from Lustre cluster. One analysis job (“train of tasks”) processes files of 100 events each.

GSI - CE GSI farm – biggest part 8 core machine Sharing: GRID-ALIEN Batch Proof Optimal configuration 1-2 slots per IO intensive jobs – currently PROOF Other - CPU intensive – Simulation, reconstruction Analysis jobs (IO) vs Simulation and MC Reconstruction (CPU) Is it possible to regulate ratio between IO and CPU intensive jobs ? Reconstruction job using of Raw data (IO or CPU)

Grid experience What can go wrong (with non zero probability) will go wrong Input data (temporally) not accessible (staged) Output SE (temporally) not accessible Not proper environment on Computing elements (impossible to compile macro, par files) + Other run type errors +

Grid stability Input data access Dynamic staging – Part of data available on “official” web site is not available anymore SE – temporary not available ( e.g. planned maintance) Part of the SE not available (e.g. One file server not responding - GSI problem in February) De synchronization in the File catalog (e.g. change of the File server name – current problem at GSI) Warning example If jobs use the n input files and probability that file is not accessible is x, than probability that your job do not crash is p(n,x) = (1-x)^n 100 jobs, 1% ==> 36%

Grid stability Condition changes dynamically in time Goal – Get the status of components needed for jobs at the moment closest to the actual execution time Optimally during job splitting Consequences Gain – No waste of computing resources More resources for Job splitter - negligible The time to check orders of magnitude smaller than actual computation time This way the user-developer will be aware of possible problems (feedback time ~ minute(s)) and can react without waiting for the jobs to fail on the worker nodes. The fraction of available data should be a parameter in a jdl file

Proposal – Job submission algorithm (0) 1. Linear loop over file list Check availability of files Make a list of computing elements (CE) close to the available data 2. Linear loop over selected CE Check the environment for given type of job Select allowed, forbidden CE Split input list to the 2 list Available, Non available Optionally, provide both lists to the user Continue with default splitting

Proposal – Job submission algorithm (1) How to develop an algorithm 1 phase Use preprocessing scripts for JDL+xml (input data and output SE availability) New information needed in jdl -pre validation script to check environment before submitting actual job The pre validation scripts (without data input) have to have priority At minimum try to compile and use general HalloWorld.par 2 phase Integration of tested algorithm to the Job splitter