23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

Jens G Jensen Atlas Petabyte store Supporting Multiple Interfaces to Mass Storage Providing Tape and Mass Storage to Diverse Scientific Communities.
31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
CHEP 2012 – New York City 1.  LHC Delivers bunch crossing at 40MHz  LHCb reduces the rate with a two level trigger system: ◦ First Level (L0) – Hardware.
1 Databases in ALICE L.Betev LCG Database Deployment and Persistency Workshop Geneva, October 17, 2005.
Grids: Why and How (you might use them) J. Templon, NIKHEF VLV T Workshop NIKHEF 06 October 2003.
O. Stézowski IPN Lyon AGATA Week September 2003 Legnaro Data Analysis – Team #3 ROOT as a framework for AGATA.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
The D0 Monte Carlo Challenge Gregory E. Graham University of Maryland (for the D0 Collaboration) February 8, 2000 CHEP 2000.
CERN - European Laboratory for Particle Physics HEP Computer Farms Frédéric Hemmer CERN Information Technology Division Physics Data processing Group.
Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
Remote Production and Regional Analysis Centers Iain Bertram 24 May 2002 Draft 1 Lancaster University.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team.
November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.
Nick Brook Current status Future Collaboration Plans Future UK plans.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
7April 2000F Harris LHCb Software Workshop 1 LHCb planning on EU GRID activities (for discussion) F Harris.
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
5 May 98 1 Jürgen Knobloch Computing Planning for ATLAS ATLAS Software Week 5 May 1998 Jürgen Knobloch Slides also on:
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
LOGO Development of the distributed computing system for the MPD at the NICA collider, analytical estimations Mathematical Modeling and Computational Physics.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
GridPP Collaboration Meeting 5 th November 2001 Dan Tovey, University of Sheffield Non-LHC and Non-US-Collider Experiments’ Requirements Dan Tovey, University.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.
CBM Computing Model First Thoughts CBM Collaboration Meeting, Trogir, 9 October 2009 Volker Friese.
Predrag Buncic Future IT challenges for ALICE Technical Workshop November 6, 2015.
CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.
General requirements for BES III offline & EF selection software Weidong Li.
Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
The MEG Offline Project General Architecture Offline Organization Responsibilities Milestones PSI 2/7/2004Corrado Gatto INFN.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
ATLAS Distributed Analysis DISTRIBUTED ANALYSIS JOBS WITH THE ATLAS PRODUCTION SYSTEM S. González D. Liko
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
A Data Handling System for Modern and Future Fermilab Experiments Robert Illingworth Fermilab Scientific Computing Division.
The Database Project a starting work by Arnauld Albert, Cristiano Bozza.
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
1 P. Murat, Mini-review of the CDF Computing Plan 2006, 2005/10/18 An Update to the CDF Offline Plan and FY2006 Budget ● Outline: – CDF computing model.
Pasquale Migliozzi INFN Napoli
LHC experiments Requirements and Concepts ALICE
The COMPASS event store in 2002
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
Ákos Frohner EGEE'08 September 2008
Cloud based Open Source Backup/Restore Tool
Computing Infrastructure for DAQ, DM and SC
Grid Canada Testbed using HEP applications
ExaO: Software Defined Data Distribution for Exascale Sciences
Support for ”interactive batch”
Distributed computing deals with hardware
Development of LHCb Computing Model F Harris
Presentation transcript:

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV T08 Toulon, France NESTOR-NOA

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 2 Outline Data management rôle of the host site Data input and filtering Data organization Data management and distribution system and services Database considerations Conclusions

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 3 Data management rôle of the host site Experiment and DAQ Data Filter Farm Calibration Filtering Event Building Quality monitoring Temporary Storage Online data quality monitoring Local temporary storage of raw data subset? (Semi)permanent storage Transfer to large computing centres Backup transfer route Control data (Semi)permanent storage of control data Local filtering, reconstruction and analysis min bandwidth 1 Gbps Raw data 1-10 Gb/s per DAQ node => ~0.1 Tb/s total LOCAL MONITORING Processing and (semi)permanent storage Associated sciences data 100 kb/s per DAQ node

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 4 Data management rôle of the host site Hosts the experiment, DAQ, Data Quality Monitoring services, Data Filter Farm and central data management services –The latter include database servers, tape vaults and robots, bookkeeping systems and file catalogue services, data access and file transfer services, data quality monitoring systems and transaction monitoring daemons Is equipped with fast network connection (minimum 1 Gbps) to all major computing centres note: bandwidth estimate is conservative and may have to be upgraded to 10 Gbps depending on data transfer requirements via the GRID Runs the calibration, triggering and event building tasks on the Data Filter Farm and optionally part of the reconstruction Hosts the Associated Sciences DAQ and Computing Centre, offering the same data processing, management and distribution services Is responsible for the smooth and efficient running of the above services and assures the timely data transfer to all major computing centres

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 5 Data input and filtering All data are transferred to shore at a rate of ~1-10 Gb/s per DAQ node The data are processed in real time by the Data Filter Farm (a few hundred PCs with processor speeds of a few GHz each) at the host site for calibration, triggering and event building According to the CDR, pattern recognition algorithms based on space-time relationships acting on the snapshot of the data of the whole detector reduce the background rates by a factor of The process involves calibration using local and extended clusters in the detector and is followed by Event Building. When the data pass the triggering criteria an event is built from all information from all optical modules in a time window around the hits causing the trigger.  Output data rate should be ~100 kb/s per DAQ node Output data are stored on Filter Farm disks, an operation which should be sustained for at least a few batches of 20 minutes of data taking Data are also transferred to temporary or semi-permanent storage on volumes adequate for several weeks of data taking Are all “raw” data lost for ever??? Could we evaluate a system whereby at least part of them are saved for further study at least while backgrounds are not fully understood?

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 6 Data organization Event data = event collections naturally grouped and analysed together as determined by physics attributes: trigger, “raw”, “Filter Farm Data”, “reco”, ntuple etc Control data = calibration, positioning and conditions data which are accumulated and stored separately: 1.Detector control system data 2.Data quality/monitoring information 3.Detector and DAQ configuration information 4.Calibration and positioning information 5.Environmental data 6.Associated sciences data Data management system = basic infrastructure and tools allowing KM3Net institutes and physicists to locate, access and transfer various forms of data in a distributed computing environment

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 7 Data management and distribution system and services Data management system components Dataset Bookkeeping System – Which data exist? Data Location Service – Where are the data located? Data Placement and Transfer System Local File Catalogues Data Access and Storage Systems Storage Element and File Transfer Services A Mass Storage System and a Storage Resource Manager interface providing an implementation independent way to access the Mass Storage System A File Transfer Service scalable to the required bandwidth I/O facilities for application access to the data Authentication, authorization and audit/accounting facilities

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 8 Database considerations A Mass Storage System implies the use of one or more database technologies Database services must be based on scalable and reliable hardware and software For the latter, consider adopting packages and tools already in use in HEP, e.g. ROOT for event data and ORACLE and/or MySQL for control data. –ROOT has proven reliable, flexible and scalable; it comes with a C++ like command line interface and a rich Graphical User Interface as well as an I/O system, a parallel running facility and a GRID interface; easy to learn for users and developers alike; long-term support and maintenance guaranteed –ORACLE is the de-facto relational database standard; MySQL and PostGreSQL are open source, hence free, and may be adopted if cost concerns are prohibitive; interoperability must be evaluated

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 9 Conclusions 1.We may need to start evaluating database options as well as various available implementations for the data management system components and services, regardless of the final computing model to be adopted, e.g. CASTOR, dCache etc for Mass Storage System, GridFTP for data transfer etc 2.Obviously data challenges can only be carried out once a more or less structured system is in place (and of course the necessary software for event simulation, reconstruction and analysis); however we could maybe start formulating requirements as to scope and scale 3.GRID? Which one? LHC experiments are finally finding it quite useful for data transfer and distributed analysis. How do we proceed?