ALICE DATA ACCESS MODEL Outline 13.12.2012 ALICE data access model - PtP Network Workshop 2  ALICE data model  Some figures.

Slides:



Advertisements
Similar presentations
SDN + Storage.
Advertisements

ALICE G RID SERVICES IP V 6 READINESS
Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck.
T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
End-to-End Analysis of Distributed Video-on-Demand Systems Padmavathi Mundur, Robert Simon, and Arun K. Sood IEEE Transactions on Multimedia, February.
An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gommadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Proceedings of.
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
1 TCP-LP: A Distributed Algorithm for Low Priority Data Transfer Aleksandar Kuzmanovic, Edward W. Knightly Department of Electrical and Computer Engineering.
Torrent-based Software Distribution in ALICE.
LHCONE Point-to-Point Service Workshop - CERN Geneva Eric Boyd, Internet2 Slides borrowed liberally from Artur, Inder, Richard, and other workshop presenters.
Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.
GridFTP Guy Warner, NeSC Training.
Advanced Network Architecture Research Group 2001/11/149 th International Conference on Network Protocols Scalable Socket Buffer Tuning for High-Performance.
Thesis Proposal Data Consistency in DHTs. Background Peer-to-peer systems have become increasingly popular Lots of P2P applications around us –File sharing,
Experiences in Design and Implementation of a High Performance Transport Protocol Yunhong Gu, Xinwei Hong, and Robert L. Grossman National Center for Data.
ALICE data access WLCG data WG revival 4 October 2013.
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES P. Saiz (IT-ES) AliEn job agents.
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
Costin Grigoras ALICE Offline. In the period of steady LHC operation, The Grid usage is constant and high and, as foreseen, is used for massive RAW and.
Efficient P2P backup through buffering at the edge S. Defrance, A.-M. Kermarrec (INRIA), E. Le Merrer, N. Le Scouarnec, G. Straub, A. van Kempen.
Xrootd Monitoring for the CMS Experiment Abstract: During spring and summer 2011 CMS deployed Xrootd front- end servers on all US T1 and T2 sites. This.
Sejong STATUS Chang Yeong CHOI CERN, ALICE LHC Computing Grid Tier-2 Workshop in Asia, 1 th December 2006.
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
Advanced Network Architecture Research Group 2001/11/74 th Asia-Pacific Symposium on Information and Telecommunication Technologies Design and Implementation.
N EWS OF M ON ALISA SITE MONITORING
Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team 12 th CERN-Korea.
Site operations Outline Central services VoBox services Monitoring Storage and networking 4/8/20142ALICE-USA Review - Site Operations.
ALICE – networking LHCONE workshop 10/02/ Quick plans: Run 2 data taking Both for Pb+Pb and p+p – Reach 1 nb -1 integrated luminosity for rare.
Update on replica management
StarFish: highly-available block storage Eran Gabber Jeff Fellin Michael Flaster Fengrui Gu Bruce Hillyer Wee Teck Ng Banu O¨ zden Elizabeth Shriver 2003.
Caitriana Nicholson, CHEP 2006, Mumbai Caitriana Nicholson University of Glasgow Grid Data Management: Simulations of LCG 2008.
Status of PDC’07 and user analysis issues (from admin point of view) L. Betev August 28, 2007.
CMS Computing Model Simulation Stephen Gowdy/FNAL 30th April 2015CMS Computing Model Simulation1.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
Xrootd Monitoring and Control Harsh Arora CERN. Setting Up Service  Monalisa Service  Monalisa Repository  Test Xrootd Server  ApMon Module.
JAliEn Java AliEn middleware A. Grigoras, C. Grigoras, M. Pedreira P Saiz, S. Schreiner ALICE Offline Week – June 2013.
ALICE DATA ACCESS MODEL Outline 05/13/2014 ALICE Data Access Model 2  ALICE data access model  Infrastructure and SE monitoring.
GridFTP Guy Warner, NeSC Training Team.
Efi.uchicago.edu ci.uchicago.edu Storage federations, caches & WMS Rob Gardner Computation and Enrico Fermi Institutes University of Chicago BigPanDA Workshop.
THE GLUE DOMAIN DEPLOYMENT The middleware layer supporting the domain-based INFN Grid network monitoring activity is powered by GlueDomains [2]. The GlueDomains.
ALICE Grid operations +some specific for T2s US-ALICE Grid operations review 7 March 2014 Latchezar Betev 1.
Analysis efficiency Andrei Gheata ALICE offline week 03 October 2012.
1 R. Voicu 1, I. Legrand 1, H. Newman 1 2 C.Grigoras 1 California Institute of Technology 2 CERN CHEP 2010 Taipei, October 21 st, 2010 End to End Storage.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team
ALICE computing Focus on STEP09 and analysis activities ALICE computing Focus on STEP09 and analysis activities Latchezar Betev Réunion LCG-France, LAPP.
SLACFederated Storage Workshop Summary Andrew Hanushevsky SLAC National Accelerator Laboratory April 10-11, 2014 SLAC.
Efi.uchicago.edu ci.uchicago.edu Sharing Network Resources Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago Federated Storage.
29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs.
Storage discovery in AliEn
1 LCG-France 22 November 2010 Tier2s connectivity requirements 22 Novembre 2010 S. Jézéquel (LAPP-ATLAS)
The ALICE Analysis -- News from the battlefield Federico Carminati for the ALICE Computing Project CHEP 2010 – Taiwan.
Federating Data in the ALICE Experiment
Dynamic Extension of the INFN Tier-1 on external resources
Data Formats and Impact on Federated Access
An example of peer-to-peer application
Ian Bird WLCG Workshop San Francisco, 8th October 2016
ALICE internal and external network
ALICE Monitoring
INFN-GRID Workshop Bari, October, 26, 2004
Bernd Panzer-Steindel, CERN/IT
Update on Plan for KISTI-GSDC
Torrent-based software distribution
Storage elements discovery
Simulation use cases for T2 in ALICE
Ákos Frohner EGEE'08 September 2008
Publishing ALICE data & CVMFS infrastructure monitoring
An Introduction to Computer Networking
Internet Protocols IP: Internet Protocol
Presentation transcript:

ALICE DATA ACCESS MODEL

Outline ALICE data access model - PtP Network Workshop 2  ALICE data model  Some figures  Infrastructure monitoring  Replica discovery mechanism

ALICE data model ALICE data access model - PtP Network Workshop 3  Central catalogue of logical file names (LFN)  With owner:group and unix-style permissions  Size, MD5 of files  Metadata on subtrees  Each LFN is associated a GUID  Any number of physical file names (PFN) can be associated to a LFN  Like root:// // / / HH and hhhhh are hashes of the GUID

ALICE data model (2) ALICE data access model - PtP Network Workshop 4  Data files are accessed directly  Jobs go to where a copy of the data is  Reading from the closest working replica to the job  Exclusive use of xrootd protocol  while also supporting http, ftp, torrent for downloading other input files  At the end of the job N replicas are uploaded from the job itself (2x ESDs, 3xAODs, etc...)  Scheduled data transfers for raw data only with xrd3cp  T0 -> one T1 / run, selected at data taking time

Some figures ALICE data access model - PtP Network Workshop 5  60 disk storage elements + 8 tape-backed (T0 and T1s)  28PB in 307M files (replicas included)  2012 averages:  31PB written (1.2GB/s) 2.4PB is the raw data (1.4PB to T0, 1PB replicated to the T1 set – only the good runs) So some 67MB/s are needed for the raw data replication In average ~half is written locally, ~half on the uplink  216PB read back (8.6GB/s) - 7x the amount written Mostly from the local storage  Sustained periods of 3-4x the above

Analysis train efficiency ALICE data access model - PtP Network Workshop 6  The most IO-demanding processing  Last 1M analysis jobs (mix of all types of analysis with aggressive scheduling)  14.2M input files  87.5% accessed from the site local SE at 3.1MB/s  12.5% read from remote at 0.97MB/s Bringing the average processing speed down to 2.8MB/s  Average job efficiency was 70% for an avg CPU power of HepSpec06  So we would need at least 0.4MB/s/HepSpec06 for the average analysis to be efficient  And the infrastructure to scale linearly with this …

A particular analysis task … ALICE data access model - PtP Network Workshop 7  IO-intensive analysis train run  Small fraction of files accessed remotely  With the expected penalty  However the external connection is the lesser issue …

Aggregated SE traffic ALICE data access model - PtP Network Workshop 8 The previous IO-intensive train (10h)

Infrastructure monitoring ALICE data access model - PtP Network Workshop 9  On each site VoBox a MonALISA service collects  Job resource consumption, WN host monitoring …  Local SEs host monitoring data (network traffic, load, sockets etc)  And performs VoBox to VoBox measurements  traceroute / tracepath  1 stream available bandwidth measurements (FDT) This is what impacts the job efficiency  All results are archived and we also infer the network topology from them

Network topology view in MonALISA ALICE data access model - PtP Network Workshop 10

Available bandwidth per stream ALICE data access model - PtP Network Workshop 11 Funny ICMP throttling Discreet effect of the congestion control algorithm on links with packet loss (x 8.39Mbps) Suggested larger-than-default buffers (8MB) Default buffers

Bandwidth test matrix ALICE data access model - PtP Network Workshop 12  4 years of archived results for 80x80 sites matrix 

Replica discovery mechanism ALICE data access model - PtP Network Workshop 13  Closest working replicas are used for both reading and writing  Sorting the SEs by the network distance to the client making the request Combining network topology data with the geographical location  Leaving as last resort only the SEs that fail the respective functional test  Weighted with their recent reliability  Writing is slightly randomized for more ‘democratic’ data distribution

Plans ALICE data access model - PtP Network Workshop 14  Encourage sites to improve their infrastructure  This has the largest impact on the efficiency  Eg. few Xrootd gateways for large GPFS clusters, insufficient backbone capacity …  Then provide as much information as possible for them to tackle the uplink problems  Deploy a similar test suite on the data servers  (Re)enable icmp where it is missing  (Re)apply TCP buffer settings  Accelerate the migration to SLC6  But we only see the end-to-end results  How can we find out what happens in between?

Conclusions ALICE data access model - PtP Network Workshop 15  ALICE uses all resources uniformly  No dedicated SEs / sites / links for particular tasks  This was an ambitious model that worked only because the network capacity exceeded the initial expectations  Now the problems are more at an operational level  And we need more meaningful / realistic monitoring data to identify them From site fabric to the backbones This is not a one-shot process