PROOF Farm preparation for Atlas FDR-1 Wensheng Deng, Tadashi Maeno, Sergey Panitkin, Robert Petkus, Ofer Rind, Torre Wenaus, Shuwei Ye BNL.

Slides:



Advertisements
Similar presentations
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Advertisements

SLUO LHC Workshop, SLACJuly 16-17, Analysis Model, Resources, and Commissioning J. Cochran, ISU Caveat: for the purpose of estimating the needed.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
ATLAS Analysis Model. Introduction On Feb 11, 2008 the Analysis Model Forum published a report (D. Costanzo, I. Hinchliffe, S. Menke, ATL- GEN-INT )
Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.
December Pre-GDB meeting1 CCRC08-1 ATLAS’ plans and intentions Kors Bos NIKHEF, Amsterdam.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Staging to CAF + User groups + fairshare Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE Offline week,
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
Report : Zhen Ming Wu 2008 IEEE 9th Grid Computing Conference.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
Wahid Bhimji University of Edinburgh J. Cranshaw, P. van Gemmeren, D. Malon, R. D. Schaffer, and I. Vukotic On behalf of the ATLAS collaboration CHEP 2012.
December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
F. Fassi, S. Cabrera, R. Vives, S. González de la Hoz, Á. Fernández, J. Sánchez, L. March, J. Salt, A. Lamas IFIC-CSIC-UV, Valencia, Spain Third EELA conference,
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
Using Bitmap Index to Speed up Analyses of High-Energy Physics Data John Wu, Arie Shoshani, Alex Sim, Junmin Gu, Art Poskanzer Lawrence Berkeley National.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
Tier-2 storage A hardware view. HEP Storage dCache –needs feed and care although setup is now easier. DPM –easier to deploy xrootd (as system) is also.
PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.
Site Report: Prague Jiří Chudoba Institute of Physics, Prague WLCG GridKa+T2s Workshop.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,
The ATLAS Computing Model and USATLAS Tier-2/Tier-3 Meeting Shawn McKee University of Michigan Joint Techs, FNAL July 16 th, 2007.
U.S. ATLAS Facility Planning U.S. ATLAS Tier-2 & Tier-3 Meeting at SLAC 30 November 2007.
Victoria, Sept WLCG Collaboration Workshop1 ATLAS Dress Rehersals Kors Bos NIKHEF, Amsterdam.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
T3g software services Outline of the T3g Components R. Yoshida (ANL)
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Data Analysis w ith PROOF, PQ2, Condor Data Analysis w ith PROOF, PQ2, Condor Neng Xu, Wen Guan, Sau Lan Wu University of Wisconsin-Madison 30-October-09.
Alien and GSI Marian Ivanov. Outlook GSI experience Alien experience Proposals for further improvement.
UTA Site Report Jae Yu UTA Site Report 7 th DOSAR Workshop Louisiana State University Apr. 2 – 3, 2009 Jae Yu Univ. of Texas, Arlington.
WLCG November Plan for shutdown and 2009 data-taking Kors Bos.
ATLAS TIER3 in Valencia Santiago González de la Hoz IFIC – Instituto de Física Corpuscular (Valencia)
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
STORAGE EXPERIENCES AT MWT2 (US ATLAS MIDWEST TIER2 CENTER) Aaron van Meerten University of Chicago Sarah Williams Indiana University OSG Storage Forum,
ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon
BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.
J. Shank DOSAR Workshop LSU 2 April 2009 DOSAR Workshop VII 2 April ATLAS Grid Activities Preparing for Data Analysis Jim Shank.
Lyon Analysis Facility - status & evolution - Renaud Vernet.
Computing Operations Roadmap
Solid State Disks Testing with PROOF
Data Challenge with the Grid in ATLAS
A full demonstration based on a “real” analysis scenario
Readiness of ATLAS Computing - A personal view
Data Federation with Xrootd Wei Yang US ATLAS Computing Facility meeting Southern Methodist University, Oct 11-12, 2011.
Simulation use cases for T2 in ALICE
ALICE Computing Upgrade Predrag Buncic
ATLAS DC2 & Continuous production
The ATLAS Computing Model
Presentation transcript:

PROOF Farm preparation for Atlas FDR-1 Wensheng Deng, Tadashi Maeno, Sergey Panitkin, Robert Petkus, Ofer Rind, Torre Wenaus, Shuwei Ye BNL

Outline  Introduction  Atlas FDR-1  Farm preparation for FDR1  PROOF tests  Analyses Sergey Panitkin

S. Rajagopalan, FDR meeting for U.S. FDR: What is it?  Provides a realistic test of the computing model from online (SFO) to analysis at Tier-2’s.  Exercise the full software infrastructure (CondDB, TAGDB, trigger configuration, simulation with mis-alignments, etc.) using mixed events.  Implement the calibration/alignment model.  Implement the Data Quality monitoring.  Specifics (from D. Charlon, T/P week):  Prepare a sample of mixed events that looks like raw data (bytestream)  Stream the events from the SFO output at Point 1  Including express and calibration streams  Copy to Tier 0 (and replicate to Tier 1’s)  Run calibration and DQ procedures on express/calibration stream  Bulk processing after hours incorporating any new calibrations.  Distribute ESD and AOD to Tier-1s (later to Tier 2’s as well)  Make TAG and DPDs  Distributed Analysis  Reprocess data after a certain time.

S. Rajagopalan, FDR meeting for U.S. FDR-1 Time Line  January:  Sample preparation, mixing events  Week of Feb. 4: FDR-1 run  Stream data through SFOs  Transfer to T0, processing of ES and CS.  Bulk processing completed by weekend.  Including ESD and AOD production  Regular shifts: DQ monitoring, Calibration and Tier-0 processing shifts  Expert coverage at Tier-1 as well to ensure smooth data transfer.  Week of February 11:  AOD samples transferred to Tier-1s  DPD production at Tier-1.  Week of February 18/25:  All data samples should be available for subsequent analysis.  At some later point:  Reprocessing at Tier-1’s and re-production of DPDs.  FDR-1 should complete before April and feedback into FDR-2

PROOF farm preparation  Existing Atlas PROOF was expanded in anticipation of FDR1  10 new nodes each with:  8 CPUs  16 GB RAM  500 GB Hard drive  Expect additional 64 GB Solid State Disk (SSD)  1Gb network  Standard Atlas software stack  Ganglia Monitoring  Latest version of root (currently 5.18 as of Jan. 28, 2008) Sergey Panitkin

Current Farm Configuration Sergey Panitkin “Old farm”  10 nodes – 4 GB RAM each  40 cores: 1.8 GHz Opterons  20 TB of HDD space (10x4x500 GB) Extension  10 nodes - 16 GB RAM each  80 cores: 2.0 GHz Kentsfields  5 TB of HDD space (10x500 GB)  640 GB SSD space (10x64 GB) +

Farm resource distribution issues  The new “extension” machines are “CPU heavy”:8 cores, 1 HDD  Tests showed that 1 CPU core requires ~ 10MB/s in typical I/O bound Atlas analysis  Tests showed 1 SATA HD can sustain ~ 20 MB/s, e.g. ~ 2 cores  In order to provide adequate bandwidth for 8 cores per box we needed to augment “extension” machines with SSDs  SSDs provide high bandwidth capable of sustaining 8 core load, but have relatively small volume – 64 GB per machine. They will be able to accommodate only a fraction of the expected FDR1 data.  Hence, SSD space should be actively managed  The exact scheme of data management needs to be worked out  The following slides attempt to summarize current discussion about data management with current proof farm configuration Sergey Panitkin

New Solid State Disks  Model: Mtron MSP-SATA  Capacity 64 GB  Average access time ~0.1 ms (typical HD ~10ms)  Sustained read ~120MB/s  Sustained write ~80 MB/s  IOPS (Sequential/ Random) 81,000/18,000  Write endurance >140 50GB write per day  MTBF 1,000,000 hours  7-bit Error Correction Code Sergey Panitkin

Farm resource distribution Sergey Panitkin SSD 640GB HDD 5TB HDD 20TB 40 Cores 80 Cores “Old Farm” Extension BNLXRDHDD1 BNLXRDHDD2 BNLXRDSSD

Plans for FDR1 and beyond  Test data transfer from dCache  Direct transfer (xrdcp) via Xrootd door on dCache  Two step transfer (dccp-xrdcp) through intermediate storage  Integration with Atlas DDM  Implement dq2 registration for dataset transfers  Gain experience with SSDs  Scalability tests with SSDs and regular HDs  Choice of optimal PROOF configuration for SSD nodes  Data staging mechanism within the farm  HD to SSD data transfer  SSD space monitoring and management  Analysis policies ( free for all, analysis train, subscription, etc)  Test “fast Xrootd access” – new I/O mode for Xrootd client  Test Xrootd/PROOF federation (geographically distributed) with Wisconsin  Organize local user community to analyze FDR data Sergey Panitkin

Data Flow I  We expect that all the data (AODs, DPDs, TAGS, etc) will first arrive at dCache.  We assume that certain subset of the data will be copied from dCache to the PROOF farm for analysis in root.  This movement is expected to be done using a set of custom scripts and is initiated by the Xrootd/PROOF farm manager.  Scripts will copy datasets using xrdcp via Xrootd door on dCache.  Fall back solution exists in case Xrootd door on dCache is unstable.  Copied datasets will be registered in DQ2.  On the xrootd farm datasets will be stored on HDD space (currentely ~25 TB)  Certain high priority datasets will be copied to SSD disks by farm manager for analysis with PROOF  Determination of the high priority datasets will be done based on physics analysis priorities (FDR coordinator, PWG, etc)  The exact scheme for SSD “subscription” needs to be worked out  Subscription, On-demand loading, etc  Look at Alice Sergey Panitkin

Integration with Atlas DDM Sergey Panitkin /data Xrootd/PROOF Farm dCache Panda xrdcp with dq2 registration /ssd xrdcptentakel DQ2 T0 dq2_ls –fp –s BNLXRDHDD1 “my_dataset” analysis Atlas user Grid transfers

FDR tests  Batch analyses with Xrootd as data server  AOD analysis. Compare speed with dCache – D.Adams, H.Ma  Store (all?) TAGS on the farm  Our previous tests showed that Athena analyses gain from TAGs stored on Xrootd  Use PROOF farm for physics analysis  Athena Root Access analysis of AODs using PROOF  ARA was demonstrated to run on PROOF in January (Shuwei Ye)  Store (all?) FDR1 DPDs on the farm  FDR1 DPDs made by H. Ma already copied to the farm  DPD based analyses  Stephanie Majewski plans to study increase in the sensitivity of an inclusive SUSY search using information from isolated tracks Sergey Panitkin

Root version mismatch issues  All of datasets for FDR1 will be produced with rel. 13, which relies on root v.5.14  PROOF farm currently uses the latest production version of root This version has many improvements in functionality and stability compare to v It is recommend by PROOF developers  Due to changes in xrootd protocol clients running root v.5.14 cannot work with xrootd/PROOF servers from v.5.18  In order to run ARA analysis on PROOF or utilize it as Xrootd SE for AOD/TAG analysis, the PROOF farm needs to be downgraded to v5.14. Such downgrade will hurt root based analysis of AANT and DnPDs.  In principle we can run 2 farms in parallel  The old farm with PROOF v.5.14  The extension farm with PROOF v.5.18  The data management scheme described on previous slides can be trivially applied to both farms.  This is a temporary solution. Athena is expected to use root v 5.18 in the next release. This will largely remove version mismatch problems Sergey Panitkin

Current status  Work in progress!  File transfer from dCache is functional  New LRC was created  Files copied to Xrootd are registered in LRC via custom dq2_cr  Datasets can be found using DDM tools  dq2-list-dataset-replicas user.HongMa.fdr08_run StreamEgamma.merge.AOD.o1_r6_t1.DPD_v130040_V5 INCOMPLETE: BNLPANDA,BNLXRDHDD1 COMPLETE:  List of files in a dataset on Xrootd can be obtained via dq2_ls  Several FDR1 AOD datasets and one DPD dataset were transferred using this mechanis  Issues:  Still need better integration with DDM  Possible problem with large files transfers via dCache door Sergey Panitkin