BESIII data processing

Slides:



Advertisements
Similar presentations
31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
Advertisements

Status of BESIII Distributed Computing BESIII Workshop, Mar 2015 Xianghu Zhao On Behalf of the BESIII Distributed Computing Group.
1 Databases in ALICE L.Betev LCG Database Deployment and Persistency Workshop Geneva, October 17, 2005.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
O. Stézowski IPN Lyon AGATA Week September 2003 Legnaro Data Analysis – Team #3 ROOT as a framework for AGATA.
EventStore Managing Event Versioning and Data Partitioning using Legacy Data Formats Chris Jones Valentin Kuznetsov Dan Riley Greg Sharp CLEO Collaboration.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Distributed Computing for CEPC YAN Tian On Behalf of Distributed Computing Group, CC, IHEP for 4 th CEPC Collaboration Meeting, Sep ,
BESIII computing 王贻芳. Peak Data volume/year Peak data rate at 3000 Hz Events/year: 1*10 10 Total data of BESIII is about 2*640 TB Event size(KB)Data volume(TB)
Status of MC Simulation Huaimin Liu BESIII Collaboration meeting IHEP, Beijing, Jan. 12, 2006.
Shuei MEG review meeting, 2 July MEG Software Status MEG Software Group Framework Large Prototype software updates Database ROME Monte Carlo.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.
MySQL and GRID Gabriele Carcassi STAR Collaboration 6 May Proposal.
BES Ⅲ Detector Description and Event Display Zhengyun YOU, Yajun Mao School of Physics, Peking University Jan.10th, 2006.
MiniBooNE Computing Description: Support MiniBooNE online and offline computing by coordinating the use of, and occasionally managing, CD resources. Participants:
Nick Brook Current status Future Collaboration Plans Future UK plans.
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
Databases E. Leonardi, P. Valente. Conditions DB Conditions=Dynamic parameters non-event time-varying Conditions database (CondDB) General definition:
The BESIII Offline Software Weidong Li Institute of High Energy Physics, Beijing Workshop on the cooperation of PRC-US in HEP 16 June 2006.
BESIII MC Release notes & planned development Dengzy, Hem, Liuhm, Youzy, Yuany Nov. 23, 2005.
1 Chapter Overview Preparing to Upgrade Performing a Version Upgrade from Microsoft SQL Server 7.0 Performing an Online Database Upgrade from SQL Server.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
4/5/2007Data handling and transfer in the LHCb experiment1 Data handling and transfer in the LHCb experiment RT NPSS Real Time 2007 FNAL - 4 th May 2007.
Framework of Job Managing for MDC Reconstruction and Data Production Li Teng Zhang Yao Huang Xingtao SDU
Distributed Computing for CEPC YAN Tian On Behalf of Distributed Computing Group, CC, IHEP for 4 th CEPC Collaboration Meeting, Sep , 2014 Draft.
5/2/  Online  Offline 5/2/20072  Online  Raw data : within the DAQ monitoring framework  Reconstructed data : with the HLT monitoring framework.
The KLOE computing environment Nuclear Science Symposium Portland, Oregon, USA 20 October 2003 M. Moulson – INFN/Frascati for the KLOE Collaboration.
AliRoot survey P.Hristov 11/06/2013. Offline framework  AliRoot in development since 1998  Directly based on ROOT  Used since the detector TDR’s for.
PHENIX and the data grid >400 collaborators 3 continents + Israel +Brazil 100’s of TB of data per year Complex data with multiple disparate physics goals.
1 L.Didenko Joint ALICE/STAR meeting HPSS and Production Management 9 April, 2000.
Status report of the KLOE offline G. Venanzoni – LNF LNF Scientific Committee Frascati, 9 November 2004.
Computing Resources for ILD Akiya Miyamoto, KEK with a help by Vincent, Mark, Junping, Frank 9 September 2014 ILD Oshu City a report on work.
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
Status of BESIII Distributed Computing BESIII Workshop, Sep 2014 Xianghu Zhao On Behalf of the BESIII Distributed Computing Group.
Oct HPS Collaboration Meeting Jeremy McCormick (SLAC) HPS Web 2.0 OR Web Apps and Databases (Oh My!) Jeremy McCormick (SLAC)
CMS Computing Model summary UKI Monthly Operations Meeting Olivier van der Aa.
Workflows and Data Management. Workflow and DM Run3 and after: conditions m LHCb major upgrade is for Run3 (2020 horizon)! o Luminosity x 5 ( )
The MEG Offline Project General Architecture Offline Organization Responsibilities Milestones PSI 2/7/2004Corrado Gatto INFN.
ALICE RRB-T ALICE Computing – an update F.Carminati 23 October 2001.
LHCb Computing activities Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group.
LHCb 2009-Q4 report Q4 report LHCb 2009-Q4 report, PhC2 Activities in 2009-Q4 m Core Software o Stable versions of Gaudi and LCG-AA m Applications.
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
Status of BESIII Distributed Computing BESIII Collaboration Meeting, Nov 2014 Xiaomei Zhang On Behalf of the BESIII Distributed Computing Group.
Compute and Storage For the Farm at Jlab
SNiPER在LHAASO实验中的应用: LodeStar
Real Time Fake Analysis at PIC
Database Replication and Monitoring
Overview of the Belle II computing
Progress on NA61/NA49 software virtualisation Dag Toppe Larsen Wrocław
CMS High Level Trigger Configuration Management
Work report Xianghu Zhao Nov 11, 2014.
ALICE analysis preservation
The LHCb Software and Computing NSS/IEEE workshop Ph. Charpentier, CERN B00le.
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
Conditions Data access using FroNTier Squid cache Server
PHP / MySQL Introduction
QA tools – introduction and summary of activities
TriggerDB copy in TriggerTool
ILD Ichinoseki Meeting
US ATLAS Physics & Computing
Status of CVS repository Production databases Production tools
Tag based analysis Ziyan DENG.
Federated Hierarchical Filter Grids
ATLAS DC2 & Continuous production
HEC Beam Test Software schematic view T D S MC events ASCII-TDS
Calibration Infrastructure Design
The LHCb Computing Data Challenge DC06
Presentation transcript:

BESIII data processing 邓子艳 2016-06-06 高能物理计算与软件会议,广东东莞

BESIII dataflow Raw data on disk (All, Bhabha, Dimu., Random trigger ……) Raw data on tapes Detector Calibration Detector Simulation Offline Reconstruction (DST Production) Background Mixing (with random trigger events) Physics Skimming(nprong, DTag, …) Reconstruction User Analysis

Key components for data processing Raw data from detector Data management system Offline software system Data quality system Database server Computing resources

BESIII data volume Total size of random trigger data: 40T Resonance Raw files Data volume(RAW) Data volume(DST) psip 32000 80T 27T jpsi 45600 85T 29T Psipp 90000 170T 56T 4040 9500 18T 6T XYZdata 60000 120T 40T Rscan 43000 21T tauscan 2000 8T 2.6T 2175 12000 22T 5T 4180 Total size of random trigger data: 40T ~100 TB raw data(Physics+ Random+CAL) per year

Raw data on Lustre file system ~2GB per raw data file Hundreds of raw files per day including : All, dimu, bhabha, diphoton, random trigger data

Raw data on Lustre file system Random trigger data Raw data Data for calibration

The architecture of Bookkeeping XML-rpc XML-rpc service JDBC DB BookkeepingSvc Bookkeeping Server MySQL JSP (Javs Server Page) HTTP Bookkeeping is mainly consists of 3 parts: A database to keep all the data information that the physicists may be interested in. A bookkeeping server that provides services to access the database. Bookkeeping client tools for ordinary users to query data. The bkk services are hosted by a central server that deals both with web pages and xmlrpc services. Database Server Side Client Side

Data management Management of raw data Import information of raw data files from online database File and dataset management: provide interface for dataset access

Data management Copy raw data from castor to disk(Lustre) Get information of raw data from Bookkeeping position in castor, runID,…… Create a dataset: runFrom, runTo, dataset name Dataset name is input of a data migration job script Submit the job After the job finished, check the consistency of raw data files cd /bes3fs/offline/data/cpfromcastor/round09 mkdir date cd date /afs/ihep.ac.cn/soft/common/sysgroup/offline/bin/CpFromCastor -c ~/bin/TypeConfig.cfg -d date dataset REAL q2n chkcopy SeqNo

Calibration constants version control Management of calibration constants Save calibration constants for specific sub-detector, software version, run range Interface for users to search specific constants Permission control for different users

BESIII Offline Software System (BOSS) BESIII Offline Software System (BOSS), is a new offline data processing software system which is developed based on GAUDI framework External Libs: Geant4, ROOT, GDML, MySQL…… OS: Scientific Linux 6, GCC 4.6.3 Simulation, calibration, reconstruction, and analysis algorithms are core software for data processing and physics analysis, software framework provides these algorithms event data service and constants data service Physics Analysis Physics constant service Calibration constant service Detector geometry service Simulation Calibration Reconstruction BESIII Offline Database Event Data Service Raw data Raw data Converter REC data converter Rec data DST data DST data converter

Reading calibration constants root file bemp put root file to db ~bemp/SqlTest/CalConstSqlHelper.cxx offlinedb MdcCalConst Read from root file $CALIBROOTCNVROOT/src/cnv/ RootMdcCalibDataCnv.cxx TCDS Read db sql $CALIBUTILROOT/src/Metadata.cxx getter setter $CALIBDATAROOT/src/Mdc/MdcCalibData.cxx Read TTree from sql results $CALIBTREECNVROOT/src/cnv/ TreeMdcCalibDataCnv.cxx MdcCalibFunSvc

Database architecture

Database performance Servers: Central database servers: Replication of DAQ and DCS Database Web server for data quality and bemp Central Database servers:1 master and 5 slaves at IHEP, other slaves at other groups Bookkeeping database and web server Central database servers: Size:35G(database files、logs) Throughput:2 connections per second , more than 200 queries per second (The statistics only from one slave) Connections | 636619 Innodb_data_read | 437587968 Uptime | 2933932 Replication of DAQ and DCS: Size: 970G BEMP database server Size: 11G

Data Quality Assurance Several kinds of MC samples generated and reconstructed J/psi->e+e-, mu+mu-, rhopi, KsKpi, PPbar Part of real data reconstructed to check the software performance and MC/data consistency

Data Production Data production uses the validated offline software release Physics production takes place 1 or 2 times per year, ~ 5 months processing time for each production Data reconstruction for newly taken data will last from the begin to the end of each data taking round Depending on when the calibration constants of sub-detectors are ready

BESIII data processing Computing Resources in IHEP CPU cores ~5000 cores Tape space (Castor) 4PB, 3PB available Local file system(Lustre) ~2800TB, ~300TB available CPU time of production jobs (with 2000 cores) Produce 1 billion jpsi inclusive mc DST events: 8 days Reconstruct 1 billion jpsi raw data: 7 days Reconstruct 0.1 billion psip raw data: 1 days Reconstruct 2.9fb-1 psipp raw data: 13 days

Tag based analysis TAG describe basic infor for each event Location of DST file saved in TAG file Save much disk space compared with skimming Analysis speed is same as skimming

Multi-input data analysis Retrieve dst and raw data in the same job Raw data of each sub-detector can be retrieved independently

Summary Large scale data samples from BESIII have been successfully processed Data management and offline software system provide quick and stable data processing for BESIII