Staging to CAF + User groups + fairshare Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE Offline week, 08.04.08.

Slides:



Advertisements
Similar presentations
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Advertisements

The LEGO Train Framework
GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage.
16/9/2004Features of the new CASTOR1 Alice offline week, 16/9/2004 Olof Bärring, CERN.
Trains status&tests M. Gheata. Train types run centrally FILTERING – Default trains for p-p and Pb-Pb, data and MC (4) Special configuration need to be.
Statistics of CAF usage, Interaction with the GRID Marco MEONI CERN - Offline Week –
New CERN CAF facility: parameters, usage statistics, user support Marco MEONI Jan Fiete GROSSE-OETRINGHAUS CERN - Offline Week –
1 Status of the ALICE CERN Analysis Facility Marco MEONI – CERN/ALICE Jan Fiete GROSSE-OETRINGHAUS - CERN /ALICE CHEP Prague.
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
Experiences Deploying Xrootd at RAL Chris Brew (RAL)
ALICE data access WLCG data WG revival 4 October 2013.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
Development Strategies for Web Applications Jonathan Babbage National Superconducting Cyclotron Laboratory.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES P. Saiz (IT-ES) AliEn job agents.
Object-Oriented Analysis & Design Subversion. Contents  Configuration management  The repository  Versioning  Tags  Branches  Subversion 2.
Status of the production and news about Nagios ALICE TF Meeting 22/07/2010.
Module 7: Resolving NetBIOS Names by Using Windows Internet Name Service (WINS)
Write-through Cache System Policies discussion and A introduction to the system.
Sejong STATUS Chang Yeong CHOI CERN, ALICE LHC Computing Grid Tier-2 Workshop in Asia, 1 th December 2006.
PROOF Cluster Management in ALICE Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
AgentsAnd Daemons Automating Data Quality Monitoring Operations Agents And Daemons Automating Data Quality Monitoring Operations Since 2009 when the LHC.
Testing the dynamic per-query scheduling (with a FIFO queue) Jan Iwaszkiewicz.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
Andrei Gheata, Mihaela Gheata, Andreas Morsch ALICE offline week, 5-9 July 2010.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
Module 8 : Configuration II Jong S. Bok
EPICS Release 3.15 Bob Dalesio May 19, Features for 3.15 Support for large arrays - done for rsrv in 3.14 Channel access priorities - planned to.
PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Yerevan Physics Institute, CERN.
EGI-InSPIRE EGI-InSPIRE RI DDM solutions for disk space resource optimization Fernando H. Barreiro Megino (CERN-IT Experiment Support)
EPICS Release 3.15 Bob Dalesio May 19, Features for 3.15 Support for large arrays Channel access priorities Portable server replacement of rsrv.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS XROOTD news New release New features.
10 May 2001WP6 Testbed Meeting1 WP5 - Mass Storage Management Jean-Philippe Baud PDP/IT/CERN.
A. Gheata, ALICE offline week March 09 Status of the analysis framework.
Status of the Shuttle Framework Alberto Colla Jan Fiete Grosse-Oetringhaus ALICE Offline Week October 2006.
November 1, 2004 ElizabethGallas -- D0 Luminosity Db 1 D0 Luminosity Database: Checklist for Production Elizabeth Gallas Fermilab Computing Division /
Analysis Trains Costin Grigoras Jan Fiete Grosse-Oetringhaus ALICE Offline Week,
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES L. Betev, A. Grigoras, C. Grigoras, P. Saiz, S. Schreiner AliEn.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
03/09/2007http://pcalimonitor.cern.ch/1 Monitoring in ALICE Costin Grigoras 03/09/2007 WLCG Meeting, CHEP.
ALICE experiences with CASTOR2 Latchezar Betev ALICE.
CASTOR Operations Face to Face 2006 Miguel Coelho dos Santos
CERN - IT Department CH-1211 Genève 23 Switzerland CASTOR F2F Monitoring at CERN Miguel Coelho dos Santos.
Dynamic staging to a CAF cluster Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
Good user practices + Dynamic staging to a CAF cluster Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CUF,
IPEmotion License Management PM (V1.2).
LCG Accounting Update John Gordon, CCLRC-RAL 10/1/2007.
AAF tips and tricks Arsen Hayrapetyan Yerevan Physics Institute, Armenia.
Cluster Status & Plans Cluster Status & Plans —— Gang Qin Jan
Federating Data in the ALICE Experiment
DCS Status and Amanda News
Real Time Fake Analysis at PIC
Report PROOF session ALICE Offline FAIR Grid Workshop #1
Status of the CERN Analysis Facility
GSIAF & Anar Manafov, Victor Penso, Carsten Preuss, and Kilian Schwarz, GSI Darmstadt, ALICE Offline week, v. 0.8.
Analysis Trains - Reloaded
Olof Bärring LCG-LHCC Review, 22nd September 2008
Storage elements discovery
ALICE – FAIR Offline Meeting KVI (Groningen), 3-4 May 2010
Ákos Frohner EGEE'08 September 2008
Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF
CTA: CERN Tape Archive Overview and architecture
Support for ”interactive batch”
Overview Multimedia: The Role of WINS in the Network Infrastructure
Presentation transcript:

Staging to CAF + User groups + fairshare Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE Offline week,

Staging to CAF + CPU fairshare - Jan Fiete Grosse-Oetringhaus2 Datasets in Practice Create a AliEn collection (in aliensh) –find -c myCollection /alice/sim/2007/LHC07c/pp_minbias/8051 root_archive.zip –find -c myCollection /alice/sim/2007/LHC07c/pp_minbias/8051 AliESDs.root Use a ROOT version that supports datasets –LXPLUS: source /afs/cern.ch/alice/caf/caf-lxplus-datasets.sh –OR: Check out from ROOT SVN: branches/dev/proof Create DS from AliEn collection –Connect to AliEn TGrid::Connect("alien://") –gridColl = gGrid- >OpenCollection("alien:///alice/cern.ch/user/j/jgrosseo/myCollection") –proofColl = gridColl->GetFileCollection(); –proofColl->SetAnchor("AliESDs.root"); // collection of root_achive.zip

Staging to CAF + CPU fairshare - Jan Fiete Grosse-Oetringhaus3 Datasets in Practice (2) Upload to PROOF cluster –Connect to PROOF –TProof::Mgr("lxb6046")->SetROOTVersion("vPROOFDSMGR"); –TProof::Open("lxb6046"); –gProof->RegisterDataSet("myDataSet", proofColl); Check status –gProof->ShowDataSets(); Use it –mgr->StartAnalysis("proof", "myDataSet");

Staging to CAF + CPU fairshare - Jan Fiete Grosse-Oetringhaus4 Dataset in Practice (3) List available datasets: gProof->ShowDataSets() You always see common datasets and datasets of your group

Staging to CAF + CPU fairshare - Jan Fiete Grosse-Oetringhaus5 Dataset in Practice (4)

Staging to CAF + CPU fairshare - Jan Fiete Grosse-Oetringhaus6 Status First users started staging Bug fix in CASTOR needed  deployed this Monday Problem with files contained in zip archives  in progress Disk usage last week

Staging to CAF + CPU fairshare - Jan Fiete Grosse-Oetringhaus7 User Groups The available disk is shared between groups using quotas The priority of concurrent processes is governed by a CPU fairshare mechanism Groups are e.g. PWGx, TPC, ACO  It is needed that you provide your group (private mail to me or ATF list) –Otherwise you cannot stage data + your relative priority will be low ~40 users in 14 groups

Compute new usages applying a correction formula CPU Fairshare measure difference between real usages and quotas CAF Store computed usages  Average every 6 hours  Retrieved every 5 mins Get groups' usage. Interval defined per each one: [  *quota..  *quota] 100% f(x) =  q +  q*exp(kx) k = 1/q*Ln(1/4) 10% 40% quota (q) usageMin usageMax 0% 20% Slide from Marco Meoni

Staging to CAF + CPU fairshare - Jan Fiete Grosse-Oetringhaus9 Usage Input to PROOF default= EMCAL=20 PHOS=20 proofteam=20 ITS=20 MUON=20 PWG0=20 ZDC=20 PWG1=20 PWG2=20 PWG3= T0=20 PWG4=20 … Values are aggregated over one month. Currently, dominated by usage in group "default" in the last weeks.

Staging to CAF + CPU fairshare - Jan Fiete Grosse-Oetringhaus10 Summary Staging + Disk quotas –User staging slowly starting –Still problems with the xrootd client in ROOT (crashes, stalling)  causes files to disappear, are restaged automatically CPU Fairshare –Users assigned to groups –Mechanism running  Priorities that are fed into PROOF will also be published in MonaLisa

Staging to CAF + CPU fairshare - Jan Fiete Grosse-Oetringhaus11 Backup

Staging to CAF + CPU fairshare - Jan Fiete Grosse-Oetringhaus12 Good User Practices Before you start using CAF –Subscribe to using CERN SIMBA –Read Code development –Try your code on at least 2 files locally 1 file may hide problems when switching to the next file –Run your code "as in PROOF" Just change "proof" to "local" in StartAnalysis –Run "full PROOF" Don't use TProof::Reset if it is not needed (current issue)

Staging to CAF + CPU fairshare - Jan Fiete Grosse-Oetringhaus13 Disk Buffer Tier-1 data export CAF computing cluster Proof worker local disk Proof master, xrootd redirector Proof worker local disk Proof worker local disk... Staging AliEn SE CASTOR MSS Tape CAF Schema

Staging to CAF + CPU fairshare - Jan Fiete Grosse-Oetringhaus14 Staging – Technical side Step 3 (now): Automatic –Staging script plugged into olbd –Implementation of PROOF datasets (by ALICE) –Staging daemon that runs on the cluster –Transparent migration from AliEn collection to PROOF datasets –Convenient for users, quota-enabled, garbage collection

Staging to CAF + CPU fairshare - Jan Fiete Grosse-Oetringhaus15 Introduction of PROOF datasets A dataset represents a list of files (e.g. physics run X) –Correspondence between AliEn collection and PROOF dataset Users register datasets –The files contained in a dataset are automatically staged from AliEn (and kept available) –Datasets are used for processing with PROOF Contain all relevant information to start processing (location of files, abstract description of content of files) File-level storing by underlying xrootd infrastructure Datasets are public for reading (you can use datasets from anybody!) There are common datasets (for data of common interest)

Staging to CAF + CPU fairshare - Jan Fiete Grosse-Oetringhaus16 PROOF worker / xrootd disk server (many) PROOF master / xrootd redirector PROOF master Dataset registers dataset removes dataset uses dataset data manager daemon data manager daemon keeps dataset persistent by requesting staging updating file information touching files olbd/ xrootd olbd/ xrootd file stager stages files removes files that are not used (least recently used above threshold) selects disk server and forwards stage request WN disk … write delete read read, touch stage Dataset concept AliEn SE CASTOR MSS

Staging to CAF + CPU fairshare - Jan Fiete Grosse-Oetringhaus17 Staging script Two directories configured in xrootd/olbd for staging –/alien –/castor Staging script given with olb.prep directive –Perl script that consists of 3 threads –Front-End: Registers stage request –Back-End Checks access privileges Triggers migration from tape (CASTOR, AliEn) Copies files, notifies xrootd –Garbage collector: Cleans up following policy file with low/high watermarks (least recently used above threshold)

Staging to CAF + CPU fairshare - Jan Fiete Grosse-Oetringhaus18 Data manager daemon Keeps content of datasets persistent on disk Regularly loops over all datasets Sends staging requests for new files Extracts meta data from recently staged files Verifies that all files are still available on the cluster (by touch, prevents garbage collection) –Speed: 100 files / s