Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 New developments in the HENP-GC HENP-GC Collaboration New Capabilities in the HENP Grand Challenge Storage Access System and its Application at RHIC.

Similar presentations


Presentation on theme: "1 New developments in the HENP-GC HENP-GC Collaboration New Capabilities in the HENP Grand Challenge Storage Access System and its Application at RHIC."— Presentation transcript:

1 1 New developments in the HENP-GC HENP-GC Collaboration New Capabilities in the HENP Grand Challenge Storage Access System and its Application at RHIC David Malon, Argonne presenting for the HENP-GC collaboration* (http:/www-rnc.lbl.gov/GC/)http:/www-rnc.lbl.gov/GC/ 8 Feb 2000 CHEP *slides thanks to L. Bernardo, D. Olson, A. Shoshani, S. Vanyashin

2 2 New developments in the HENP-GC HENP-GC Collaboration Outline Overview of HENP-GC What’s new since CHEP’98 STACS Experiment interface Scalability testing Conclusion

3 3 New developments in the HENP-GC HENP-GC Collaboration High-Energy & Nuclear Physics Grand Challenge 3 year project, mid-1997 to mid-2000 Funded by DOE/MICS with contributed effort from DOE/HENP Participants: –NERSC/Berkeley Lab L. Bernardo, A. Mueller, H. Nordberg, A. Shoshani, A. Sim, J. Wu –Argonne D. Malon, E. May, G. Pandola –Brookhaven Lab B. Gibbard, S. Johnson, J. Porter, T. Wenaus –Nuclear Science/Berkeley Lab D. Olson, A. Vaniachine, J. Yang, D. Zimmerman

4 4 New developments in the HENP-GC HENP-GC Collaboration What is the Grand Challenge architecture? An order-optimized prefetch architecture for data retrieval from multilevel storage in a multiuser environment Queries select events and specific event components based upon tag attribute ranges –query estimates are provided prior to execution –collections as queries are also supported Because event components are distributed over several files, processing an event requires delivery of a “bundle” of files Events are delivered in an order that takes advantage of what is already on disk, and multiuser policy-based prefetching of further data from tertiary storage GCA intercomponent communication is CORBA- based, but physicists are shielded from this layer

5 5 New developments in the HENP-GC HENP-GC Collaboration System Overview Client GCA STACS Staged event files Event Tags (Other) disk-resident event data Index HPSS pftp File Catalog Client

6 6 New developments in the HENP-GC HENP-GC Collaboration STorage Access Coordination System (STACS) Query Estimator Query Monitor Cache Manager Policy Module Bit- Sliced Index File Catalog Query Status, Cache Map Query Estimate File Bundles, Event lists Pftp and file purge commands List of file bundles and events Requests for file caching and purging

7 7 New developments in the HENP-GC HENP-GC Collaboration What is new since CHEP’98 Multi-component multi-file event model –Event is composed of separate components –Components of a single event are stored in separate files Removal of Objectivity/DB-specific dependencies –eventID is an experiment-specific typedef –CORBA file catalog interface User-accessible file bundle information for user-code- dependent file I/O CORBA interface to tag database Scalability tests –10M events, 7 components, 100-250 queries

8 8 New developments in the HENP-GC HENP-GC Collaboration Multiple-Component Events Event Components –partition each event into 5-10 pieces –tracks, hits, vertices … Queries can request one or more components –all components of an event must be in disk cache at the same time Problem: how to manage multiple component files to minimize re-caching of files Pseudo query language SELECT tracks, hits FROM Run17 WHERE glb_trk_tot>0 & glb_trk_tot<10 & n_vert_total<3

9 9 New developments in the HENP-GC HENP-GC Collaboration Example of multiple components e1 e2 e3 e4 e5 e6 e7 e8 e9 Files of Component A Files of Component B Component A of event e1 Component B of event e1 File 4 File 3 File 1 File 2 File Bundles: (F1,F2: e1,e2,e3,e5), (F3,F2: e4,e7), (F3,F4: e6,e8,e9)

10 10 New developments in the HENP-GC HENP-GC Collaboration Multicomponent Event Delivery Grand Challenge software –partitions the collection of qualifying events according to which file bundles must be cached on disk to permit their processing –attempts to optimize the order of bundle delivery in a multiuser environment –supports prefetching of bundles

11 11 New developments in the HENP-GC HENP-GC Collaboration File Weight Policy for Multi-Component Events File weight (bundle) = 1 if it appears in a bundle, = 0 otherwise Initial file weight = SUM (all bundles for each query) over all queries Dynamic file weight: the file weight for a file in a bundle that was processed is decremented by 1

12 12 New developments in the HENP-GC HENP-GC Collaboration Caching Policy (1) Query service policy –round robin –query is skipped if no bundle fits available cache –when query is skipped, a skip_service counter is incremented. –if counter is above preset limit, all activity stops till this query is serviced

13 13 New developments in the HENP-GC HENP-GC Collaboration Caching Policy (2) Bundle caching policy –select bundle with most files in cache –if a tie, select bundle with highest weight –if not enough space in cache, select next bundle that fits in cache –if none fit in cache, select next bundle with one less file in cache, etc. –if no bundles found, skip query, and increase skip_service counter

14 14 New developments in the HENP-GC HENP-GC Collaboration Caching Policy (3) File purging policy –files are in 2 categories: file currently in use file not currently in use –purge file with lowest dynamic_file_weight –if a tie, purge largest file Pre-fetching policy –Initially: unlimited –this parameter can be assigned dynamically

15 15 New developments in the HENP-GC HENP-GC Collaboration File Tracking Log Query 1 starts here Query 2 starts here Bundle was found in cache Bundle shared by two queries Bundle (3 files) formed, then passed to query

16 16 New developments in the HENP-GC HENP-GC Collaboration STAR event model T. Ullrich, Jan. 2000

17 17 New developments in the HENP-GC HENP-GC Collaboration database Interfacing GCA to experiment GC System StIOMaker fileCatalog tagDB QueryMonitor CacheManager QueryEstimator gcaClient FileCatalog IndexFeeder GCA InterfaceSTAR Components IndexBuilder

18 18 New developments in the HENP-GC HENP-GC Collaboration Experiment-specific Implementations IndexFeeder utility –Experiment provides “tag database” that has for each event attributes used for event selection (“tags”) fileID for each event component –IndexFeeder reads the experiment’s “tag database” so that GC-provided “index builder” can create index FileCatalog server –FileCatalog queries the “file catalog” database of the experiment to translate fileID to HPSS & disk path

19 19 New developments in the HENP-GC HENP-GC Collaboration Client-side implementation gcaClient interface –gcaResources interface: initialization, configuration, establishment of contact with remote STACS components –QueryObject: query definition, estimation, execution –Order-Optimized Iterator: delivery of event ids (and optional file information) as bundles are cached

20 20 New developments in the HENP-GC HENP-GC Collaboration Iterator Extensions // iterator is initialized with this query's token, and a // pointer to GCA_Resources for access to remote STACS //components and configuration parameters: OrderOptIter GCIter(query->token(), &GCA_Resources); while (GCIter.next(eventID& myEvent){ usercode(myEvent); // process an event } In order to allow for experiment-specific code to handle file I/O an optional flag that signals a new file bundle is used along with a method to retrieve the file name for each event component: while(GCIter.next(eventID& myEvent, bool& thisIsANewBundle)) {... } string getComponentFileName(const string componentName)

21 21 New developments in the HENP-GC HENP-GC Collaboration STAR uses the fileCatalog & instance tables in MySQL to satisfy the fcFileCatalog CORBA interface // fileCat.idl // Luis Bernardo, Alex Sim // Lawrence Berkeley National Laboratory // May 99 // Purpose: defines interface between File Catalog (server) and Cache // Manager, Query Estimator and Query Monitor (clients). #include "smDefs.idl" struct FileInfo { FID_T fid; double fileSize; string localFileName; string remoteFileName; string tapeID; }; typedef FileInfo FILEINFO_T; interface fcFileCatalog { double getFileSize(in FID_T fid); double getSumFileSizes(in FIDSET_T fset); FSIZE_SET_T getListFileSizes(in FIDSET_T fset); FILEINFO_T getFileInfo(in FID_T fid); void updateFileCatalog(in string asciifilecat); };

22 22 New developments in the HENP-GC HENP-GC Collaboration Schema of fileCatalog table mysql> desc fileCatalog; +--------------+-----------------------------------+------+-----+---------------------+----------------+ | Field | Type | Null | Key | Default | Extra | +--------------+-----------------------------------+------+-----+---------------------+----------------+ | prodType | enum('unknown','daq','sim','job') | | MUL | unknown | | | prodName | varchar(80) | | | | | | prodSerie | int(11) | | | 0 | | | prodInstance | int(11) | | | 0 | | | fileSequence | int(11) | | | 0 | | | dbServer | enum('unknown','bnl','lbl') | | | unknown | | | eventType | int(11) | | | 0 | | | path | varchar(64) | | | | | | fileName | varchar(20) | | | | | | dataset | varchar(64) | | | | | | size | int(11) | | | 0 | | | createTime | datetime | | | 0000-00-00 00:00:00 | | | insertTime | timestamp(10) | YES | | NULL | | | Nevents | mediumint(9) | | | 0 | | | NevLo | mediumint(9) | | | 0 | | | NevHi | mediumint(9) | | | 0 | | | owner | varchar(20) | | | | | | grp | varchar(20) | | | star | | | permit | varchar(10) | | | -rw-r----- | | | type | varchar(20) | | | | | | component | varchar(20) | | | | | | format | varchar(10) | | | | | | site | varchar(10) | | | | | | hpss | enum('Y','N') | | | Y | | | status | smallint(6) | | | 0 | | | comment | blob | | | NULL | | | tape | smallint(6) | | | 0 | | | generation | smallint(6) | YES | | 0 | | | ID | mediumint(9) | | PRI | 0 | auto_increment | +--------------+-----------------------------------+------+-----+---------------------+----------------+ 29 rows in set (0.08 sec) Path to primary instance in HPSS

23 23 New developments in the HENP-GC HENP-GC Collaboration Schema of instance table mysql> desc instances; +--------------+---------------+------+-----+---------------------+-------+ | Field | Type | Null | Key | Default | Extra | +--------------+---------------+------+-----+---------------------+-------+ | fID | int(9) | | PRI | 0 | | | instance | tinyint(4) | | PRI | 0 | | | created | datetime | | | 0000-00-00 00:00:00 | | | volume | varchar(30) | | | n/a | | | path | varchar(128) | | | n/a | | | permit | varchar(10) | YES | | NULL | | | owner | varchar(20) | YES | | NULL | | | grp | varchar(20) | YES | | NULL | | | hpss | enum('Y','N') | | | Y | | | lastAccessed | datetime | | | 0000-00-00 00:00:00 | | | site | varchar(10) | | | BNL | | | location | varchar(10) | | | rcf | | | tape | smallint(6) | | | 0 | | | comment | varchar(255) | YES | | NULL | | | enteredDB | timestamp(10) | YES | | NULL | | +--------------+---------------+------+-----+---------------------+-------+ 15 rows in set (0.10 sec) Disk path, location where cache manager puts file

24 24 New developments in the HENP-GC HENP-GC Collaboration Scalability testing Test Dataset –10M events –7 event components –1.6 TB –4700 files QE tested up to 100 concurrent queries QM tested up to 250 concurrent queries 24 hour runs Bugs were found & fixed, system ran OK

25 25 New developments in the HENP-GC HENP-GC Collaboration File processing by 100 queries

26 26 New developments in the HENP-GC HENP-GC Collaboration File stage requests

27 27 New developments in the HENP-GC HENP-GC Collaboration QE estimation times

28 28 New developments in the HENP-GC HENP-GC Collaboration Conclusion HENP-GC has developed a system for optimized access to multi-component event data files stored in HPSS. General CORBA interfaces are defined for interfacing with the experiment. A client component encapsulates interaction with the servers and provides an ODMG-style iterator. Has been tested up to 10M events, 7 event components, 250 concurrent queries. Is currently being integrated with the STAR experiment ROOT-based I/O analysis system.


Download ppt "1 New developments in the HENP-GC HENP-GC Collaboration New Capabilities in the HENP Grand Challenge Storage Access System and its Application at RHIC."

Similar presentations


Ads by Google