Presentation is loading. Please wait.

Presentation is loading. Please wait.

29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs.

Similar presentations


Presentation on theme: "29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs."— Presentation transcript:

1 29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs

2 29/04/2008ALICE-FAIR Computing Meeting2 Source Analysis Data 1 Million pp collisions, generated with AliROOT v4-05-Rev-03 10000 of AliESDs.root files, each containing 100 events 1 file = 100 events = 3,3 MB10000 files = 1 000 000 events = 33,6 GB Source Analysis Code Train of 3 AliAnalysisTasks (one - “ALICE Offline Bible” example, rest - “real- life” examples) Tasks process all events in the whole (I/O intensive, all tree branches are read, except FMD) Analysis Environment ROOT version 5.15/06

3 29/04/2008ALICE-FAIR Computing Meeting3 Test Hardware Test Cases local disk – each job processes its own dataset from local disk storage; remote – each job processes its own dataset from a single remote fileserver, accessing it with xrootd protocol; Lustre – each job processes its own dataset from a locally mounted partition of a lustre cluster test server – 8 cores, RAID5 comprised of 4 disks, readahead set to 1/8 MB (out of maximum 8 MB) remote fileserver – 1 core, 2 RAID5 controllers for 8 and 6 disks; Lustre – cluster of 13 fileservers (17 for the last test)

4 29/04/2008ALICE-FAIR Computing Meeting4 Test Objective: Difference in speed between parallel jobs (train of tasks) processing local data, parallel jobs processing data from remote fileserver, and parallel jobs processing data from Lustre cluster. One analysis job (“train of tasks”) processes 10000 files of 100 events each. Why data analysis on ESD files is so slow? Why data throughput rate doesn't scale for parallel analysis jobs? Questions so far:

5 29/04/2008ALICE-FAIR Computing Meeting5 Test Objective: Highest achievable data throughput limited by hardware for test servers. Aggregate data throughput for parallel C++ jobs. One C++ job "processes" 100 ESD files of 100 events each. It reads ESD data files as binary data by chunks of 1 KB. Hardware limits to be measured by hdparm. Why data throughput is so low for ROOT code? Why data throughput doesn't scale for parallel ROOT jobs? Questions so far: Observations so far: Data throughput rate for reading with C++ program is higher than with ROOT and scales

6 29/04/2008ALICE-FAIR Computing Meeting6 Test Objective: The influence of ESD file size on data throughput for ROOT analysis job. Source ESD files to be merged into larger files of different sizes. What if we run analysis on larger ESD files? Questions so far: Observations so far: For data files with 1000 events each data throughput rate increases by 1/3.

7 29/04/2008ALICE-FAIR Computing Meeting7 Test Objective: Data throughput comparison for analysis of ESD files located on local disk, on a remote fileserver, and on a Lustre cluster. Large ESD files are of ~525 MB size (20000 events per file), small are of ~3.3 MB size (100 events per file). How to increase data throughput rate? Questions so far: Observations so far: With larger data files data throughput rate increased and started to scale with the number of jobs Due to unidentified reasons analysis jobs on data stored on remote fileserver regularly crash with "segmentation violation error"

8 29/04/2008ALICE-FAIR Computing Meeting8 What if we don't use RAID5, but store data on separately mounted disks? What is the trade-off? Test Objective: Aggregate data throughput rate for parallel analysis jobs processing ESD files located on a single disk. One analysis job ("train of tasks") processes one ESD file of 20000 events each (~520MB file size). Questions so far: Observations so far: Analyzing ESDs from a single disk is as fast as analyzing from RAID5 comprised of 4 disks.

9 29/04/2008ALICE-FAIR Computing Meeting9 Test Objective: Comparison of aggregate data throughput rates for parallel analysis jobs processing large ESD files stored on local RAID5 storage, and on 4 local disks that are mounted separately. For the latter, jobs are distributed equally among discs. RAID5 readahead is set to maximum. How to increase data throughput rates in another way? Questions so far: Observations so far: Using JBOD configuration increases performance significantly But keeping an eye on data sets distribution among disks is tricky

10 29/04/2008ALICE-FAIR Computing Meeting10 Test Objective: Improvement of aggregate data throughput for parallel analysis jobs processing large ESD files located on a RAID5 by caching data files beforehand. Data files are read into memory with a C++ function before reading is performed by ROOT inside analysis code. Combining all tweaks, what is the maximum achievable throughput rate? Questions so far: Observations so far: Caching data with C++ function beforehand improves data throughput rates

11 29/04/2008ALICE-FAIR Computing Meeting11 Test Objective: Comparison of aggregate data throughput rates for parallel analysis jobs processing ESD files stored on local RAID5 storage, and on 4 local disks that are mounted separately. Data files are cached with C++ function before reading is performed by ROOT inside analysis code. Sticking to the current file size of ESDs with prior caching and JBOD configuration it is possible to increase performance by almost a factor of 9 for 12 jobs per node. With a larger file size the increase would be even more dramatic. Observations so far:

12 29/04/2008ALICE-FAIR Computing Meeting12 Test Objective: Aggregate data throughput for analysis jobs running on all available slots on 53 computing nodes sharing alice-t3 LSF queue. One analysis job (“train of tasks”) processes 10000 files of 100 events each which are stored on Lustre cluster. Data files are cached with C++ function before reading is performed by ROOT inside analysis code. Observations so far: each job reading from Lustre gets an equal throughput rate which depends on file size; as many jobs can run as the network link capacity to Lustre allows, in this case 10 Gb/s

13 29/04/2008ALICE-FAIR Computing Meeting13 Summary Current situation: Possible improvement directions: 16 MB/s for one job reading from local disk No increase with addition of parallel jobs reading from local disk Increasing size of data files Letting parallel jobs read from separately mounted disks Preloading data files, and letting ROOT process read them out of RAM http://wiki.gsi.de/cgi-bin/view/Grid/AnalysisSpeedTests

14 29/04/2008ALICE-FAIR Computing Meeting14 Extended Summary Pt. 1  The data throughput for one ESD analysis job (ALICE Offline Bible task example) processing data files from a local disk storage is around 16 MB/s, which brings analysis speed to around 535 ev/s.  Data throughput for a ROOT job is twice smaller than for a plain C++ job that reads into memory.  First scaling in data throughput rate for parallel ROOT analysis jobs on local disk data is observed when files are merged into much larger ones (from 3.3 MB to ~525 MB).  A subset of parallel jobs running analysis on data stored on a remote fileserver runs into a ROOT segmentation violation error. It may be 4 jobs or 1 job, but it happened at all test reruns.

15 29/04/2008ALICE-FAIR Computing Meeting15 Extended Summary Pt. 2  The increase of RAID5 readahead to 8MB increases maximum possible data throughput rate for hardware from 70 MB/s (default readahead) to 180 MB/s, but that doesn't affect throughput rate for ESD analysis jobs that read whole events. Possible reason: too many branches.  When using 4 separately mounted SATA disks the aggregate data throughput for all disks is the sum of each disk's throughput capability. In this case, 70 MB/s – for one disk, 280 – for 4 disks together.  In case of using separately mounted disks, saturation of the data throughput rate for parallel ESD jobs starts when 2 jobs are running per disk. Rate value peaks at 96 MB/s for 12 parallel jobs processing large ESD files located on 4 separately mounted disks.  Although using 4 separately mounted disks in comparison to using RAID5 improves data throughput rates for parallel analysis jobs, data safety and maintenance efforts for such analysis scheme are called in question.  The method of caching data files with C++ function, when ROOT application reads data from memory before analysis starts, improves data throughput rate for parallel jobs processing small ESD files on a single local disk. This method also significantly improves data throughput rate for parallel jobs processing large and small ESD files on RAID5. Although for large ESD files rate is twice higher.


Download ppt "29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs."

Similar presentations


Ads by Google