Analysis efficiency Andrei Gheata ALICE offline week 03 October 2012.

Slides:

Advertisements

Similar presentations

The LEGO Train Framework

Advertisements

– Unfortunately, this problems is not yet fully under control – No enough information from monitoring that would allow us to correlate poor performing.

T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.

open access portal Mihaela Gheata ALICE offline week 19 Nov 2014.

Trains status&tests M. Gheata. Train types run centrally FILTERING – Default trains for p-p and Pb-Pb, data and MC (4) Special configuration need to be.

Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.

ALICE Operations short summary LHCC Referees meeting June 12, 2012.

ALICE DATA ACCESS MODEL Outline ALICE data access model - PtP Network Workshop 2  ALICE data model  Some figures.

Virtual Organization Approach for Running HEP Applications in Grid Environment Łukasz Skitał 1, Łukasz Dutka 1, Renata Słota 2, Krzysztof Korcyl 3, Maciej.

Lecture 11: DMBS Internals

Wahid Bhimji University of Edinburgh J. Cranshaw, P. van Gemmeren, D. Malon, R. D. Schaffer, and I. Vukotic On behalf of the ATLAS collaboration CHEP 2012.

ALICE data access WLCG data WG revival 4 October 2013.

Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.

 Optimization and usage of D3PD Ilija Vukotic CAF - PAF 19 April 2011 Lyon.

Status of the production and news about Nagios ALICE TF Meeting 22/07/2010.

Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.

ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.

ALICE Grid operations: last year and perspectives (+ some general remarks) ALICE T1/T2 workshop Tsukuba 5 March 2014 Latchezar Betev Updated for the ALICE.

DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.

PWG3 Analysis: status, experience, requests Andrea Dainese on behalf of PWG3 ALICE Offline Week, CERN, Andrea Dainese 1.

Andrei Gheata, Mihaela Gheata, Andreas Morsch ALICE offline week, 5-9 July 2010.

Analysis trains – Status & experience from operation Mihaela Gheata.

5/2/  Online  Offline 5/2/20072  Online  Raw data : within the DAQ monitoring framework  Reconstructed data : with the HLT monitoring framework.

1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.

Optimizing I/O Performance for ESD Analysis Misha Zynovyev, GSI (Darmstadt) ALICE Offline Week, October 28, 2009.

ALICE Operations short summary ALICE Offline week June 15, 2012.

Network awareness and network as a resource (and its integration with WMS) Artem Petrosyan (University of Texas at Arlington) BigPanDA Workshop, CERN,

CMS Computing Model Simulation Stephen Gowdy/FNAL 30th April 2015CMS Computing Model Simulation1.

SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.

Xrootd Monitoring and Control Harsh Arora CERN. Setting Up Service  Monalisa Service  Monalisa Repository  Test Xrootd Server  ApMon Module.

ALICE DATA ACCESS MODEL Outline 05/13/2014 ALICE Data Access Model 2  ALICE data access model  Infrastructure and SE monitoring.

PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.

1 Offline Week, October 28 th 2009 PWG3-Muon: Analysis Status From ESD to AOD:  inclusion of MC branch in the AOD  standard AOD creation for PDC09 files.

DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.

DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.

M. Gheata ALICE offline week, October Current train wagons GroupAOD producersWork on ESD input Work on AOD input PWG PWG31 (vertexing)2 (+

Stephen Gowdy FNAL 9th Feb 2015CMS Computing Model Simulation 1.

PWG3 analysis (barrel)

Max Baak 1 Efficient access to files on Castor / Grid Cern Tutorial Max Baak, CERN 30 October 2008.

M. Gheata ALICE offline week, 24 June  A new analysis train macro was designed for production  /ANALYSIS/macros/AnalysisTrainNew.C /ANALYSIS/macros/AnalysisTrainNew.C.

Analysis Performance and I/O Optimization Jack Cranshaw, Argonne National Lab October 11, 2011.

Data Analysis w ith PROOF, PQ2, Condor Data Analysis w ith PROOF, PQ2, Condor Neng Xu, Wen Guan, Sau Lan Wu University of Wisconsin-Madison 30-October-09.

Improving the analysis performance Andrei Gheata ALICE offline week 7 Nov 2013.

Improving Performance using the LINUX IO Scheduler Shaun de Witt STFC ISGC2016.

ROOT : Outlook and Developments WLCG Jamboree Amsterdam June 2010 René Brun/CERN.

29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs.

ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.

15.June 2004Bernd Panzer-Steindel, CERN/IT1 CERN Mass Storage Issues.

Handles for improving analysis performance Andrei Gheata Tier 2 workshop, 19 Dec Presentation given on ALICE computing board meeting 03/12/2012.

ROOT IO workshop What relates to ATLAS. General Both CMS and Art pushing for parallelization at all levels. Not clear why as they are anyhow CPU bound.

The ALICE Analysis -- News from the battlefield Federico Carminati for the ALICE Computing Project CHEP 2010 – Taiwan.

Federating Data in the ALICE Experiment

Data Formats and Impact on Federated Access

ALICE experience with ROOT I/O

Analysis trains – Status & experience from operation

ALICE Monitoring

FileStager test results

ALICE analysis preservation

INFN-GRID Workshop Bari, October, 26, 2004

BD-CACHE Big Data Caching for Datacenters

Bernd Panzer-Steindel, CERN/IT

ALICE Computing : 2012 operation & future plans

Metrics? Efficiency? Cost?

Storage elements discovery

Analysis framework - status

Ákos Frohner EGEE'08 September 2008

ALICE Computing Model in Run3

Lecture 11: DMBS Internals

Grid Canada Testbed using HEP applications

Performance optimizations for distributed analysis in ALICE

Presentation transcript:

Analysis efficiency Andrei Gheata ALICE offline week 03 October 2012

Sources for improving analysis efficiency The analysis flow involves mixed processing phases per event – Reading event data from disk – sequential (!) – De-serializing the event object hierarchy – sequential (!) – Processing the event - parallelizable – Cleaning the event structures - sequential – Writing the output – sequential but parallelizable – Merging the outputs – sequential but parallelizable The efficiency of the analysis job: – job_eff = (t ds +t proc +t cl )/t total – analysis_eff = t proc / t total Time/event for different phases depending on many factors – T read ~ IOPS*event_size/read_throughput – to be minimized Minimize event size, keep under control read throughput – T ds +T cl ~ event_size*n_branches – to be minimized Minimize event size and complexity – T proc = ∑ wagons T i – to be maximized Maximize number of wagons and useful processing – T write = output_size/write_throughput – to be minimized t read t ds t proc t cl t write Event #0 Event #1 Event #2 Event #m Event #0 Event #1 Event #2 Event #n Event #0 Event #1 Event #2 Event #p t merge

Monitoring analysis efficiency Instrumentation at the level of TAlienFile and AliAnalysisManager Collecting timing, data size transfers, efficiency for different stages – Correlated with site, SE, LFN, PFN Collection of data per subjob, remote or local – mgr->SetFileInfoLog(“fileinfo.log”); – Already in action for LEGO trains

Monitored analysis info ################################################################# pfn /11/60343/578c e1-9cd cfd8b68#AliAOD.root url root://xrootd3.farm.particle.cz:1094//11/60343/578c e1-9cd cfd8b68#AliAOD.root se ALICE::Prague::SE image 1 nreplicas 0 openstamp opentime runtime filesize readsize throughput ################################################################# pfn /13/34934/2ed51c74-618b-11e1-a1cc-63e6dd7c661e#AliAOD.root url root://xrootd3.farm.particle.cz:1094//13/34934/2ed51c74-618b-11e1-a1cc-63e6dd7c661e#AliAOD.root se ALICE::Prague::SE image 1 nreplicas 0 openstamp opentime runtime filesize readsize throughput #summary######################################################### train_name train root_time root_cpu init_time io_mng_time exec_time alien_site CERN host_name lxbse13c04.cern.ch Processed input files Analysis info

Throughput plots A simple and intuitive way to present the results Will allow diagnosing both the infrastructure & the analysis Throughput [MB/sec] Time [sec] PFN1PFN2PFN3PFN4PFN5 Initialization I/O execution

Few numbers for an empty analysis Spinning 50 MB/sAccess time 13 msRead size 270 MB AOD PbPb Job time 45.5 secThroughput 5.93 MB/sJob efficiency 86.5 % SSD 266 MB/sAccess time 0.2 msRead size 270 MB AOD PbPb Job time 39.5 secThroughput 6.83 MB/sJob efficiency 94.1 % Inter site 7.4 MB/s (JINR)Access time = RTT 63 ms + local disk access time (?) Read size MB AOD PbPb L=200, Job time 258 secThroughput MB/sJob efficiency 2.5 % L= number of concurrent processes running on the disk storage server I/O latency is a killer for events with many branches De-serialization is determinant for locally available data – it depends on the size, but ALSO on the complexity (number of branches) L=5, Job time 46.8 secThroughput 0.46 MB/sJob efficiency 13.4 %

The source of problems Highly fragmented buffer queries over high latency network – Big number of buffers retrieved sequentially No asynchronous reading or prefetching enabled in xrootd or elsewhere ROOT provides the mechanism to compact buffers and read them async: TTreeCache – Not used until now – Now added in AliAnalysisManager

Reading improvement AOD AOD PbPb, JINR::SE (RTT=65ms to CERN) Cache sizeAsync. readSpeedup 0 (current status)-1 50 MBNo Yes MBNo Yes MBNo Yes MBNo Yes

Reading improvement AOD AOD pp, LBL::SE (RTT=173ms to CERN) Cache sizeAsync. readSpeedup 0 (current status)-1 50 MBNo Yes MBYes20.12

Reading improvement MC ESD pp, CNAF::SE (RTT=20 ms to CERN) ESD pp, CERN::EOS (RTT=0.3 ms) Cache sizeAsync. readSpeedup 0 (current status)-1 50 MB, ESD cache onlyYes MB, ESD, TK, TR cachesYes4.22 Cache sizeAsync. readSpeedup 0 (current status)-1 50 MB, ESD, TK, TR cachesYes1.32

What to do to get it For AOD or ESD data, nothing – Cache set by default to 100 MB, async read enabled – Size of cache can be tuned via: mgr->SetCacheSize(bytes) For MC, the cache sizes for kinematics and TR will follow the manager setting – Don’t forget to use: mcHandler->SetPreReadMode(AliMCEventHandler::kLmPreRead)

To do’s Feed analysis info to alimonitor DB – Provide info in real time about analysis efficiency and status of data flows – Point out site configuration and dispatching problems TTreePerfStats-based analysis – Check how our data structures perform and pin down eventual problems