23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Published byModified over 6 years ago
Presentation on theme: "23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV."— Presentation transcript:
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV T08 Toulon, France NESTOR-NOA
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 2 Outline Data management rôle of the host site Data input and filtering Data organization Data management and distribution system and services Database considerations Conclusions
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 3 Data management rôle of the host site Experiment and DAQ Data Filter Farm Calibration Filtering Event Building Quality monitoring Temporary Storage Online data quality monitoring Local temporary storage of raw data subset? (Semi)permanent storage Transfer to large computing centres Backup transfer route Control data (Semi)permanent storage of control data Local filtering, reconstruction and analysis min bandwidth 1 Gbps Raw data 1-10 Gb/s per DAQ node => ~0.1 Tb/s total LOCAL MONITORING Processing and (semi)permanent storage Associated sciences data 100 kb/s per DAQ node
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 4 Data management rôle of the host site Hosts the experiment, DAQ, Data Quality Monitoring services, Data Filter Farm and central data management services –The latter include database servers, tape vaults and robots, bookkeeping systems and file catalogue services, data access and file transfer services, data quality monitoring systems and transaction monitoring daemons Is equipped with fast network connection (minimum 1 Gbps) to all major computing centres note: bandwidth estimate is conservative and may have to be upgraded to 10 Gbps depending on data transfer requirements via the GRID Runs the calibration, triggering and event building tasks on the Data Filter Farm and optionally part of the reconstruction Hosts the Associated Sciences DAQ and Computing Centre, offering the same data processing, management and distribution services Is responsible for the smooth and efficient running of the above services and assures the timely data transfer to all major computing centres
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 5 Data input and filtering All data are transferred to shore at a rate of ~1-10 Gb/s per DAQ node The data are processed in real time by the Data Filter Farm (a few hundred PCs with processor speeds of a few GHz each) at the host site for calibration, triggering and event building According to the CDR, pattern recognition algorithms based on space-time relationships acting on the snapshot of the data of the whole detector reduce the background rates by a factor of 10 4 - 10 5. The process involves calibration using local and extended clusters in the detector and is followed by Event Building. When the data pass the triggering criteria an event is built from all information from all optical modules in a time window around the hits causing the trigger. Output data rate should be ~100 kb/s per DAQ node Output data are stored on Filter Farm disks, an operation which should be sustained for at least a few batches of 20 minutes of data taking Data are also transferred to temporary or semi-permanent storage on volumes adequate for several weeks of data taking Are all “raw” data lost for ever??? Could we evaluate a system whereby at least part of them are saved for further study at least while backgrounds are not fully understood?
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 6 Data organization Event data = event collections naturally grouped and analysed together as determined by physics attributes: trigger, “raw”, “Filter Farm Data”, “reco”, ntuple etc Control data = calibration, positioning and conditions data which are accumulated and stored separately: 1.Detector control system data 2.Data quality/monitoring information 3.Detector and DAQ configuration information 4.Calibration and positioning information 5.Environmental data 6.Associated sciences data Data management system = basic infrastructure and tools allowing KM3Net institutes and physicists to locate, access and transfer various forms of data in a distributed computing environment
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 7 Data management and distribution system and services Data management system components Dataset Bookkeeping System – Which data exist? Data Location Service – Where are the data located? Data Placement and Transfer System Local File Catalogues Data Access and Storage Systems Storage Element and File Transfer Services A Mass Storage System and a Storage Resource Manager interface providing an implementation independent way to access the Mass Storage System A File Transfer Service scalable to the required bandwidth I/O facilities for application access to the data Authentication, authorization and audit/accounting facilities
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 8 Database considerations A Mass Storage System implies the use of one or more database technologies Database services must be based on scalable and reliable hardware and software For the latter, consider adopting packages and tools already in use in HEP, e.g. ROOT for event data and ORACLE and/or MySQL for control data. –ROOT has proven reliable, flexible and scalable; it comes with a C++ like command line interface and a rich Graphical User Interface as well as an I/O system, a parallel running facility and a GRID interface; easy to learn for users and developers alike; long-term support and maintenance guaranteed –ORACLE is the de-facto relational database standard; MySQL and PostGreSQL are open source, hence free, and may be adopted if cost concerns are prohibitive; interoperability must be evaluated
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 9 Conclusions 1.We may need to start evaluating database options as well as various available implementations for the data management system components and services, regardless of the final computing model to be adopted, e.g. CASTOR, dCache etc for Mass Storage System, GridFTP for data transfer etc 2.Obviously data challenges can only be carried out once a more or less structured system is in place (and of course the necessary software for event simulation, reconstruction and analysis); however we could maybe start formulating requirements as to scope and scale 3.GRID? Which one? LHC experiments are finally finding it quite useful for data transfer and distributed analysis. How do we proceed?