Presentation is loading. Please wait.

Presentation is loading. Please wait.

Grand Challenge and PHENIX Report post-MDC2 studies of GC software –feasibility for day-1 expectations of data model –simple robustness tests –Comparisons.

Similar presentations


Presentation on theme: "Grand Challenge and PHENIX Report post-MDC2 studies of GC software –feasibility for day-1 expectations of data model –simple robustness tests –Comparisons."— Presentation transcript:

1 Grand Challenge and PHENIX Report post-MDC2 studies of GC software –feasibility for day-1 expectations of data model –simple robustness tests –Comparisons to data carousel … not yet GC meeting highlights, plans Possible additions/enhancements of GC software –HPSS savvy –Distributed processing???

2

3

4

5 Cache use studies –Single long processing time query ~0.2MB/s Cache kept near 1GB processing continuous

6 GC Test conditions: Singles Single Queries –100 events/file –100-200 MB/file –Varying cache size processing time query size HPSS status –1 tape drive –fast purge policy (5 minutes) to isolate GC capabilities –pftp of files generally from 4-8 MB/s though total rate closer to 1-3 MB/s –HPSS savvy should improve by factor of 2-3!

7 Multiple queries single query –0.0<rndm<0.2 time ~ 2000s overlapping queries: –0.0<rndm<0.2 –0.1<rndm<0.3 time ~ 3000s overlapping files staged to disk first

8 Test conditions: Doubles Double Queries –identical queries at same time delayed with different processing times –overlapping queries

9 Robustness Start query –normal 8 GB/37 files –ctrl-C after first 2 files 3rd file staged to cache stops cleanly –start identical query –ctrl-C at 14 files 15th file staged to cache stops cleanly –different query –etc. etc. –Very robust! –Troubles only when Objectivity lockserver fails!!

10 GC meeting highlights Post MDC2 the GC commandeered HPSS and performed some tests: –robustness, correctness, tape drive dependencies, 1 P.I.P. link to user code, etc. CAS plans. How does this affect/change GC capabilities. Interface of GC with physics analyses. Scalability issues -- tests to commence in July –1000’s of files, 10,000 events/file, 5 components, 2TB total Generic interfaces –quest to make the GCA as independent of our specific problem as possible -- usability for other HEP expts, climate modeling, combustion studies, etc.

11 First year offline configuration As best I can tell fundamentally different configuration than STAR: –Day1 (many sub-detectors): ~10 Detector sub-groups with their own files for calibration purposes –Year 1 (many physics analyses): 60 X rate of STAR -> smaller DST events (~100 kB/event) –no physical separation of events into components (maybe hits???) » single component at least for year 1 (caveat on later slides) couple thousand events/file since any physics analysis will in general cut no tighter than 1%, unless we filter events to separate files according to physics cuts, every major physics analysis query will want every file. –Prefiltering adds excessive complications and the need to correct for biases, etc. »possible exception: centrality presorting -- 4-5 bins »according to trigger conditions, detector configuration, date, etc.

12 Projected use of G.C. Intimately related to expectations of CAS machines –Day1 (many sub-detectors): Separate calibration/special run files Separate machines Different instances of G.C./D.C. Separate cache, etc. –Year 1 (Physics Analyses): some small/separate analyses –on CAS machines/server –but usually on micro-DST’s A few large jobs: need all/most files Running on CAS/server? Possibility of cache over several disks? –Distributed processing –10’s GB each –Data spread out -- send analysis code to machine HPSS CAS HPSS Multi CPU BIG DISK

13 Partial Query Biases Possible troubles with partial queries if they introduce a physics bias However, only a problem if we presort our data in files according to physics signals May presort according to centrality in day-1/year-1.

14 Components? A couple possible ways to separate events into components Problems? –each component corresponds to a separate file on tape too many tape mounts? Event HitsTracksRawGlobal Event RawmDST1DSTmDST2

15

16

17

18 Objectivity Woes Surgically remove Objectivity “We strongly recommend against using an ODBMS in those applications that are handled perfectly well with relational database(s)” -- Choosing an Object Database, Objectivity –Lockserver problems restarting often –Movement of disk to rnfs01 rebuilding of objy tagDB –Possible alternatives? Root? –Robustness: multiple accesses »each node of the farm and CAS machines accessing same file »layer between reconstruction nodes and DB? –Scalability: »100 GB tagDB? -- chain of files?

19 Carousel “ORNL” software carousel server mySQL database filelist HPSS tape HPSS cache pftp rmds03 carousel client pftp CAS rmds01 rmds02 “stateless”

20 Data Carousel: “Strip Mining” Written and tested by Jerome Lauret and Morrison –layer in front of ORNL batch transfer software mostly in perl administration organization throttling integrate multiple user requests for specific files (not events) maintain disk FIFO staging area 5-7 MB/s integrated rate!! –G.C. is near 1-3 MB/s »tape optimization should clean that up Missing: –Robust disk cache –Interface to physics analyses –error checking –etc.

21 Interface with Physics Analysis Offline code: –DST’s written in ROOT –tagDB also in ROOT (probably) not OBJY -- no better alternative Interface to GC –return file number/event number –capability to run query at root prompt? –Possibility to return list of events when a bundle is cached? To keep continual interaction with GC to a minimum


Download ppt "Grand Challenge and PHENIX Report post-MDC2 studies of GC software –feasibility for day-1 expectations of data model –simple robustness tests –Comparisons."

Similar presentations


Ads by Google