Presentation is loading. Please wait.

Presentation is loading. Please wait.

 Optimization and usage of D3PD Ilija Vukotic CAF - PAF 19 April 2011 Lyon.

Similar presentations


Presentation on theme: " Optimization and usage of D3PD Ilija Vukotic CAF - PAF 19 April 2011 Lyon."— Presentation transcript:

1  Optimization and usage of D3PD Ilija Vukotic CAF - PAF 19 April 2011 Lyon

2 Overview  Formats  Optimizations  Local tests  Large scale tests 2

3 Formats data MC data MC egamma JETMET ESDsAODs D3PDs Ball surface ~ event size 3 Sizes just as indication, in reality depends on: Pile-up Stream Repro tag D3PDs not flat tree any more. With additional trees some D3PDs have ~10k branches

4 example 4

5 Root File organization 5 std eventsbaskets doubles floats file  We choose to fully split (better compression factor)  Baskets are written to file as soon as they get full  That makes parts of the same event scattered over the file

6 Tradeoffs Optimizations Options  Split level – 99 (full)  Zip level - 6  Basket size - 2kb  Memberwise streaming  Basket reordering  by event  by branch  TTreeCache  “New root” – AutoFlush matching TTC size 6 Constrains  Memory  Disk size  Read/write time Read scenarios  Full sequential  Some events  Parts of events  Proof

7 Current settings 7 AODs and ESDs 2010 – fixed size baskets (2kB), files re-ordered but basket sizes not optimized at the end of production jobs 2011 until now – All the trees (9 of them) given default 30 MB of memory, basket sizes “optimized”, autoflushed ( if its unzipped size was larger than 30MB ) 17.X.Y : The largest tree “Collection Tree” optimized split level 0 and memberwise streaming ESD/RDO autoflush each 5 events, AOD each 10 events other trees back to 2010 model.

8 Current settings 8 D3PDs 2010 fixed size baskets (2kB) reodered by event basket size optimized properly zip level changed to 6 done in merge step 2011 till now ROOT basket size optimization autoflush at 30 MB No information if re-optimization done or not (need to check!) 17.X.Y not clear yet

9 Local disk performance D3PD 9  When reading all events real time dominated by CPU time  Not so for sparse reading  Root optimized (file rewritten using hadd –f6) improves in CPU but not in HDD time (!) We are here now 2010

10 D3PD reading Egamma dataset 11 files – 90 GB Tests: 100% 1% TTreeCache ON root optimized 10 Large scale tests

11 EOS – xroot disk pool  Experimental 1 setup for a large scale analysis farm  Xroot server with 24 nodes each with 20 x 2TB raid0 FS (for this test only 10 nodes were used with maximum theoretical throughput 1GB/s )  To stress it used 23 x 8 cores with ROOT 5.26.0b (slc4, gcc 3.4)  Only Proof reading D3PDs tested 11 1 Caveat: real life performance will be significantly worse.

12 EOS – xroot disk pool cont.  Here only maximal sustained event rates (real use case averages will be significantly smaller)  Original – it would be faster to read all the events even if we would need only 1%  Reading full optimized data gave sustained read speed of 550 MB/s 12 Log scale !

13 dCache vs. Lustre  Tested in Zeuthen and Hamburg  Minimum bias D3PD data HDD read requests 13  Single unoptimized file (Root 5.22, 1k branches of 2kb, CF=1)  Single optimized file (Root 5.26, hadd -f2) TTCTest 1Test 2 dCache No 17339440547 Yes44 Lustre No17339440504 Yes19397

14 Conclusions  Many possible ways and parameters to optimize data for faster input  Different formats and use cases with sometimes conflicting requirements makes optimization more difficult  In 2010 we used file reordering and that significantly decreased job duration and stress on the disk systems  Currently taken data optimized by ROOT but that may be suboptimal for some D3PDs  In need of new performance measurements and search for optimal settings  DPM, Lustre, dCache  Need careful job specific tuning to reach optimal performance 14


Download ppt " Optimization and usage of D3PD Ilija Vukotic CAF - PAF 19 April 2011 Lyon."

Similar presentations


Ads by Google