PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.

Slides:



Advertisements
Similar presentations
Concurrent programming: From theory to practice Concurrent Algorithms 2014 Vasileios Trigonakis Georgios Chatzopoulos.
Advertisements

Query Processing and Optimizing on SSDs Flash Group Qingling Cao
GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark 2.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
Statistics of CAF usage, Interaction with the GRID Marco MEONI CERN - Offline Week –
Buying a Laptop. 3 Main Components The 3 main components to consider when buying a laptop or computer are Processor – The Bigger the Ghz the faster the.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Evaluation of Windows 7 RC Build 7100 By Muswera Walter Supervisor: Mr John Ebden Consultants: Billy Morgan and Jill Japp.
Slide 1 Windows PC Accelerators Reporter :吳柏良. Slide 2 Outline l Introduction l Windows SuperFetch l Windows ReadyBoost l Windows ReadyDrive l Conclusion.
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
TPB Models Development Status Report Presentation to the Travel Forecasting Subcommittee Ron Milone National Capital Region Transportation Planning Board.
Introduction to HP LoadRunner Getting Familiar with LoadRunner >>>>>>>>>>>>>>>>>>>>>>
Filesytems and file access Wahid Bhimji University of Edinburgh, Sam Skipsey, Chris Walker …. Apr-101Wahid Bhimji – Files access.
Wahid Bhimji University of Edinburgh J. Cranshaw, P. van Gemmeren, D. Malon, R. D. Schaffer, and I. Vukotic On behalf of the ATLAS collaboration CHEP 2012.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
UTA Site Report Jae Yu UTA Site Report 4 th DOSAR Workshop Iowa State University Apr. 5 – 6, 2007 Jae Yu Univ. of Texas, Arlington.
Development of the Graphical User Interface and Improvement and Streamlining of NYMTC's Best Practice Model Jim Lam, Andres Rabinowicz, Srini Sundaram,
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
CERN - IT Department CH-1211 Genève 23 Switzerland t Tier0 database extensions and multi-core/64 bit studies Maria Girone, CERN IT-PSS LCG.
Multi-Tiered Storage with Xrootd at ATLAS Western Tier 2 Andrew Hanushevsky Wei Yang SLAC National Accelerator Laboratory 1CHEP2012, New York
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
 Optimization and usage of D3PD Ilija Vukotic CAF - PAF 19 April 2011 Lyon.
Amy Apon, Pawel Wolinski, Dennis Reed Greg Amerson, Prathima Gorjala University of Arkansas Commercial Applications of High Performance Computing Massive.
21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.
Experience with the Thumper Wei Yang Stanford Linear Accelerator Center May 27-28, 2008 US ATLAS Tier 2/3 workshop University of Michigan, Ann Arbor.
PROOF Cluster Management in ALICE Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
+ CS 325: CS Hardware and Software Organization and Architecture Memory Organization.
Price Performance Metrics CS3353. CPU Price Performance Ratio Given – Average of 6 clock cycles per instruction – Clock rating for the cpu – Number of.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Resource Predictors in HEP Applications John Huth, Harvard Sebastian Grinstein, Harvard Peter Hurst, Harvard Jennifer M. Schopf, ANL/NeSC.
Optimisation of Grid Enabled Storage at Small Sites Jamie K. Ferguson University of Glasgow – Jamie K. Ferguson – University.
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
PROOF in Atlas Tier 3 model Sergey Panitkin 1 BNL.
PROOF Farm preparation for Atlas FDR-1 Wensheng Deng, Tadashi Maeno, Sergey Panitkin, Robert Petkus, Ofer Rind, Torre Wenaus, Shuwei Ye BNL.
MultiJob pilot on Titan. ATLAS workloads on Titan Danila Oleynik (UTA), Sergey Panitkin (BNL) US ATLAS HPC. Technical meeting 18 September 2015.
Status SC3 SARA/Nikhef 20 juli Status & results SC3 throughput phase SARA/Nikhef Mark van de Sanden.
The Million Point PI System – PI Server 3.4 The Million Point PI System PI Server 3.4 Jon Peterson Rulik Perla Denis Vacher.
CERN Database Services for the LHC Computing Grid Maria Girone, CERN.
PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,
Atlas Software Structure Complicated system maintained at CERN – Framework for Monte Carlo and real data (Athena) MC data generation, simulation and reconstruction.
Jérôme Jaussaud, Senior Product Manager
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Computer Performance. Hard Drive - HDD Stores your files, programs, and information. If it gets full, you can’t save any more. Measured in bytes (KB,
Testing Infrastructure Wahid Bhimji Sam Skipsey Intro: what to test Existing testing frameworks A proposal.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Data Analysis w ith PROOF, PQ2, Condor Data Analysis w ith PROOF, PQ2, Condor Neng Xu, Wen Guan, Sau Lan Wu University of Wisconsin-Madison 30-October-09.
10/18/01Linux Reconstruction Farms at Fermilab 1 Steven C. Timm--Fermilab.
ATLAS TIER3 in Valencia Santiago González de la Hoz IFIC – Instituto de Física Corpuscular (Valencia)
 IO performance of ATLAS data formats Ilija Vukotic for ATLAS collaboration CHEP October 2010 Taipei.
PROOF on multi-core machines G. GANIS CERN / PH-SFT for the ROOT team Workshop on Parallelization and MultiCore technologies for LHC, CERN, April 2008.
1© Copyright 2015 EMC Corporation. All rights reserved. NUMA(YEY) BY JACOB KUGLER.
G. Russo, D. Del Prete, S. Pardi Frascati, 2011 april 4th-7th The Naples' testbed for the SuperB computing model: first tests G. Russo, D. Del Prete, S.
29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs.
Lyon Analysis Facility - status & evolution - Renaud Vernet.
Getting the Most out of Scientific Computing Resources
Understanding and Improving Server Performance
Getting the Most out of Scientific Computing Resources
Solid State Disks Testing with PROOF
Diskpool and cloud storage benchmarks used in IT-DSS
GSIAF & Anar Manafov, Victor Penso, Carsten Preuss, and Kilian Schwarz, GSI Darmstadt, ALICE Offline week, v. 0.8.
PROOF in Atlas Tier 3 model
Blahoslav Pastirčák, IEP SAS Košice Pavol Bobík, IEP SAS Košice
Grid Canada Testbed using HEP applications
Support for ”interactive batch”
Presentation transcript:

PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI

Outline  Hardware tests  SSD vs HDD  PROOF clusters federation tests 2 Sergey Panitkin

Current BNL PROOF Farm Configuration “Old farm”-Production  40 cores: 1.8 GHz Opterons  20 TB of HDD space (10x4x500 GB)  Open for users (~14 for now)  Root 5.14 (for rel 13 compatibility)  All FDR1 AODs and DPDs available  Used as Xrootd SE by some users  Datasets in LRC (-s BNLXRDHDD1) “New Farm” – test site  10 nodes - 16 GB RAM each  80 cores: 2.0 GHz Kentsfields  5 TB of HDD space (10x500 GB)  640 GB SSD space (10x64 GB)  Closed for users (for now)  Latest versions of root  Hardware and software tests + 3 Sergey Panitkin

New Solid State BNL  Model: Mtron MSP-SATA  Capacity 64 GB  Average access time ~0.1 ms (typical HD ~10ms)  Sustained read ~120MB/s  Sustained write ~80 MB/s  IOPS (Sequential/ Random) 81,000/18,000  Write endurance >140 50GB write per day  MTBF 1,000,000 hours  7-bit Error Correction Code 4 Sergey Panitkin

Test configuration  1+1 or 1+8 nodes PROOF farm configurations  2x4 core Kentsfield CPUs per node, 16 GB RAM per node  All default settings in software and OS  Different configuration of SSD and HDD hardware depending on tests  Root  “PROOF Bench” suit of benchmark scripts to simulate analysis in root. Part of root distribution.   Data simulate HEP events ~1k per event  Single ~3+ GB file per PROOF worker in this tests  Reboot before every test to avoid memory caching effects  My set of tests emulates interactive, command prompt root session  Plot one variable, scan ~10E7 events, ala D3PD analysis  Look at read performance of I/O subsystem in PROOF context 5 Sergey Panitkin

SSD Tests Typical test session in root 6 Sergey Panitkin

SSD vs HDD CPU limited  SSD holds clear speed advantage  ~ 10 times faster in concurrent read scenario 7 Sergey Panitkin

SSD vs HDD With 1 worker : 5.3M events, 15.8 MB read out of ~3 GB of data on disk With 8 workers: 42.5M events, MB read out of ~24 GB of data 8 Sergey Panitkin

SSD: single disk vs RAID  SSD raid has minimal impact until 8 simultaneously running jobs  Behavior at 8+ workers is not explored in details yet 9 Sergey Panitkin

SSD: single disk vs RAID 10 Sergey Panitkin

HDD single disk vs RAID I/O limited? I/O and CPU limited? 1 disk shows rather poor scaling in this tests 3 disk raid supports 6 workers? 3x750GB disks in RAID 0 (software RAID) vs 1x500GB drive 11 Sergey Panitkin

SSD vs HDD. 8 node farm Aggregate (8 node farm) analysis rate as a function of number of workers per node Almost linear scaling with number of nodes 12 Sergey Panitkin

 The main issue, in PROOF context, is to match I/O demand and supply  Some observations from our tests (current and previous)  Scan of a single variable in root generates ~4 MB/s read load per worker  One SATA HDD can sustain ~2-3 PROOF workers  HDD raid array can sustain ~2xN workers (N –number of disks in array)  One Mtron SSD can sustain ~8 workers  8 core machine with 1 HDD makes little sense in PROOF context. Unless you will feed PROOF jobs with data from external SE (dCache, NFS disk vault, etc)  Solid State Disks (SSD) are ideal for PROOF (our future?).  We plan to continue tests with more realistic loads  ARA on AODs and DPDs  Full PROOF bench tests  RAID and OS optimization, etc  Main issue with SSD is size -> efficient data management is needed! Discussion 13 Sergey Panitkin

PROOF Cluster federation tests  In principle Xrootd/PROOF design supports federation of geographically distributed clusters  Setup instructions at:  Interesting capability for T3 applications  Pool together distributed local resources: disk, CPU  Relatively easy to implement:  Requires only configuration files changes  Can have different configurations  Transparent for users  Single “name space” – may require some planning  Single entry point for analysis  New and untested capability 14 Sergey Panitkin

PROOF Cluster federation tests  We could not run federation tests in root – queries crashed  Bug in filename matching during file validation was identified during visit of primary PROOF developer (Gerri Ganis) to BNL in April.  New patched version of root is available at CERN ( d) since last week (May 21).  We successfully configured and tested 2 sub-domain federation at BNL  All that is required for the test is modified Xrootd/PROOF configuration files (and Xrootd/PROOF daemon restart) 15 Sergey Panitkin

Federated PROOF Clusters Our test configuration: 1 super-master 2 sub-masters with 3 worker nodes each 16 Sergey Panitkin

Multi-cluster root session I 17 Sergey Panitkin

Multi-cluster root session II 18 Sergey Panitkin

Summary and Plans  SSD offer significant performance advantage in concurrent analysis environment  ~x10 better read performance than HDD in our test  We successfully tested PROOF cluster federation  We plan to expand our hardware tests with more use cases  SSD integration in PROOF will require understanding data management issues involved  We will test PROOF federation with Wisconsin T3 to explore issues of:  Performance  Name space integration  Security Sergey Panitkin19