Presentation is loading. Please wait.

Presentation is loading. Please wait.

PROOF Farm preparation for Atlas FDR-1 Wensheng Deng, Tadashi Maeno, Sergey Panitkin, Robert Petkus, Ofer Rind, Torre Wenaus, Shuwei Ye BNL.

Similar presentations


Presentation on theme: "PROOF Farm preparation for Atlas FDR-1 Wensheng Deng, Tadashi Maeno, Sergey Panitkin, Robert Petkus, Ofer Rind, Torre Wenaus, Shuwei Ye BNL."— Presentation transcript:

1 PROOF Farm preparation for Atlas FDR-1 Wensheng Deng, Tadashi Maeno, Sergey Panitkin, Robert Petkus, Ofer Rind, Torre Wenaus, Shuwei Ye BNL

2 Outline  Introduction  Atlas FDR-1  Farm preparation for FDR1  PROOF tests  Analyses Sergey Panitkin

3 S. Rajagopalan, FDR meeting for U.S. FDR: What is it?  Provides a realistic test of the computing model from online (SFO) to analysis at Tier-2’s.  Exercise the full software infrastructure (CondDB, TAGDB, trigger configuration, simulation with mis-alignments, etc.) using mixed events.  Implement the calibration/alignment model.  Implement the Data Quality monitoring.  Specifics (from D. Charlon, T/P week):  Prepare a sample of mixed events that looks like raw data (bytestream)  Stream the events from the SFO output at Point 1  Including express and calibration streams  Copy to Tier 0 (and replicate to Tier 1’s)  Run calibration and DQ procedures on express/calibration stream  Bulk processing after 24-48 hours incorporating any new calibrations.  Distribute ESD and AOD to Tier-1s (later to Tier 2’s as well)  Make TAG and DPDs  Distributed Analysis  Reprocess data after a certain time.

4 S. Rajagopalan, FDR meeting for U.S. FDR-1 Time Line  January:  Sample preparation, mixing events  Week of Feb. 4: FDR-1 run  Stream data through SFOs  Transfer to T0, processing of ES and CS.  Bulk processing completed by weekend.  Including ESD and AOD production  Regular shifts: DQ monitoring, Calibration and Tier-0 processing shifts  Expert coverage at Tier-1 as well to ensure smooth data transfer.  Week of February 11:  AOD samples transferred to Tier-1s  DPD production at Tier-1.  Week of February 18/25:  All data samples should be available for subsequent analysis.  At some later point:  Reprocessing at Tier-1’s and re-production of DPDs.  FDR-1 should complete before April and feedback into FDR-2

5 PROOF farm preparation  Existing Atlas PROOF farm @BNL was expanded in anticipation of FDR1  10 new nodes each with:  8 CPUs  16 GB RAM  500 GB Hard drive  Expect additional 64 GB Solid State Disk (SSD)  1Gb network  Standard Atlas software stack  Ganglia Monitoring  Latest version of root (currently 5.18 as of Jan. 28, 2008) Sergey Panitkin

6 Current Farm Configuration Sergey Panitkin “Old farm”  10 nodes – 4 GB RAM each  40 cores: 1.8 GHz Opterons  20 TB of HDD space (10x4x500 GB) Extension  10 nodes - 16 GB RAM each  80 cores: 2.0 GHz Kentsfields  5 TB of HDD space (10x500 GB)  640 GB SSD space (10x64 GB) +

7 Farm resource distribution issues  The new “extension” machines are “CPU heavy”:8 cores, 1 HDD  Tests showed that 1 CPU core requires ~ 10MB/s in typical I/O bound Atlas analysis  Tests showed 1 SATA HD can sustain ~ 20 MB/s, e.g. ~ 2 cores  In order to provide adequate bandwidth for 8 cores per box we needed to augment “extension” machines with SSDs  SSDs provide high bandwidth capable of sustaining 8 core load, but have relatively small volume – 64 GB per machine. They will be able to accommodate only a fraction of the expected FDR1 data.  Hence, SSD space should be actively managed  The exact scheme of data management needs to be worked out  The following slides attempt to summarize current discussion about data management with current proof farm configuration Sergey Panitkin

8 New Solid State Disks  Model: Mtron MSP-SATA7035064  Capacity 64 GB  Average access time ~0.1 ms (typical HD ~10ms)  Sustained read ~120MB/s  Sustained write ~80 MB/s  IOPS (Sequential/ Random) 81,000/18,000  Write endurance >140 years @ 50GB write per day  MTBF 1,000,000 hours  7-bit Error Correction Code Sergey Panitkin

9 Farm resource distribution Sergey Panitkin SSD 640GB HDD 5TB HDD 20TB 40 Cores 80 Cores “Old Farm” Extension BNLXRDHDD1 BNLXRDHDD2 BNLXRDSSD

10 Plans for FDR1 and beyond  Test data transfer from dCache  Direct transfer (xrdcp) via Xrootd door on dCache  Two step transfer (dccp-xrdcp) through intermediate storage  Integration with Atlas DDM  Implement dq2 registration for dataset transfers  Gain experience with SSDs  Scalability tests with SSDs and regular HDs  Choice of optimal PROOF configuration for SSD nodes  Data staging mechanism within the farm  HD to SSD data transfer  SSD space monitoring and management  Analysis policies ( free for all, analysis train, subscription, etc)  Test “fast Xrootd access” – new I/O mode for Xrootd client  Test Xrootd/PROOF federation (geographically distributed) with Wisconsin  Organize local user community to analyze FDR data Sergey Panitkin

11 Data Flow I  We expect that all the data (AODs, DPDs, TAGS, etc) will first arrive at dCache.  We assume that certain subset of the data will be copied from dCache to the PROOF farm for analysis in root.  This movement is expected to be done using a set of custom scripts and is initiated by the Xrootd/PROOF farm manager.  Scripts will copy datasets using xrdcp via Xrootd door on dCache.  Fall back solution exists in case Xrootd door on dCache is unstable.  Copied datasets will be registered in DQ2.  On the xrootd farm datasets will be stored on HDD space (currentely ~25 TB)  Certain high priority datasets will be copied to SSD disks by farm manager for analysis with PROOF  Determination of the high priority datasets will be done based on physics analysis priorities (FDR coordinator, PWG, etc)  The exact scheme for SSD “subscription” needs to be worked out  Subscription, On-demand loading, etc  Look at Alice Sergey Panitkin

12 Integration with Atlas DDM Sergey Panitkin /data Xrootd/PROOF Farm dCache Panda xrdcp with dq2 registration /ssd xrdcptentakel DQ2 T0 dq2_ls –fp –s BNLXRDHDD1 “my_dataset” analysis Atlas user Grid transfers

13 FDR tests  Batch analyses with Xrootd as data server  AOD analysis. Compare speed with dCache – D.Adams, H.Ma  Store (all?) TAGS on the farm  Our previous tests showed that Athena analyses gain from TAGs stored on Xrootd  Use PROOF farm for physics analysis  Athena Root Access analysis of AODs using PROOF  ARA was demonstrated to run on PROOF in January (Shuwei Ye)  Store (all?) FDR1 DPDs on the farm  FDR1 DPDs made by H. Ma already copied to the farm  DPD based analyses  Stephanie Majewski plans to study increase in the sensitivity of an inclusive SUSY search using information from isolated tracks Sergey Panitkin

14 Root version mismatch issues  All of datasets for FDR1 will be produced with rel. 13, which relies on root v.5.14  PROOF farm currently uses the latest production version of root -5.18. This version has many improvements in functionality and stability compare to v.5.14. It is recommend by PROOF developers  Due to changes in xrootd protocol clients running root v.5.14 cannot work with xrootd/PROOF servers from v.5.18  In order to run ARA analysis on PROOF or utilize it as Xrootd SE for AOD/TAG analysis, the PROOF farm needs to be downgraded to v5.14. Such downgrade will hurt root based analysis of AANT and DnPDs.  In principle we can run 2 farms in parallel  The old farm with PROOF v.5.14  The extension farm with PROOF v.5.18  The data management scheme described on previous slides can be trivially applied to both farms.  This is a temporary solution. Athena is expected to use root v 5.18 in the next release. This will largely remove version mismatch problems Sergey Panitkin

15 Current status  Work in progress!  File transfer from dCache is functional  New LRC was created  Files copied to Xrootd are registered in LRC via custom dq2_cr  Datasets can be found using DDM tools  dq2-list-dataset-replicas user.HongMa.fdr08_run1.0003050.StreamEgamma.merge.AOD.o1_r6_t1.DPD_v130040_V5 INCOMPLETE: BNLPANDA,BNLXRDHDD1 COMPLETE:  List of files in a dataset on Xrootd can be obtained via dq2_ls  Several FDR1 AOD datasets and one DPD dataset were transferred using this mechanis  Issues:  Still need better integration with DDM  Possible problem with large files transfers via dCache door Sergey Panitkin


Download ppt "PROOF Farm preparation for Atlas FDR-1 Wensheng Deng, Tadashi Maeno, Sergey Panitkin, Robert Petkus, Ofer Rind, Torre Wenaus, Shuwei Ye BNL."

Similar presentations


Ads by Google