INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.

Slides:



Advertisements
Similar presentations
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Advertisements

Introduction to CMS computing CMS for summer students 7/7/09 Oliver Gutsche, Fermilab.
Report of Liverpool HEP Computing during 2007 Executive Summary. Substantial and significant improvements in the local computing facilities during the.
Storage: Futures Flavia Donno CERN/IT WLCG Grid Deployment Board, CERN 8 October 2008.
T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
Workload Management Massimo Sgaravatto INFN Padova.
Regional Computing Centre for Particle Physics Institute of Physics AS CR (FZU) TIER2 of LCG (LHC Computing Grid) 1M. Lokajicek Dell Presentation.
Tier 2 Prague Institute of Physics AS CR Status and Outlook J. Chudoba, M. Elias, L. Fiala, J. Horky, T. Kouba, J. Kundrat, M. Lokajicek, J. Svec, P. Tylka.
March 27, IndiaCMS Meeting, Delhi1 T2_IN_TIFR of all-of-us, for all-of-us, by some-of-us Tier-2 Status Report.
1 INDIACMS-TIFR TIER-2 Grid Status Report IndiaCMS Meeting, Sep 27-28, 2007 Delhi University, India.
Quantitative Methodologies for the Scientific Computing: An Introductory Sketch Alberto Ciampa, INFN-Pisa Enrico Mazzoni, INFN-Pisa.
A. Mohapatra, HEPiX 2013 Ann Arbor1 UW Madison CMS T2 site report D. Bradley, T. Sarangi, S. Dasu, A. Mohapatra HEP Computing Group Outline  Infrastructure.
Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Take on messages from Lecture 1 LHC Computing has been well sized to handle the production and analysis needs of LHC (very high data rates and throughputs)
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.
Introduction to U.S. ATLAS Facilities Rich Baker Brookhaven National Lab.
INDIACMS-TIFR Tier 2 Grid Status Report I IndiaCMS Meeting, April 05-06, 2007.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Integrating JASMine and Auger Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA 23606
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Spending Plans and Schedule Jae Yu July 26, 2002.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
The LHCb Italian Tier-2 Domenico Galli, Bologna INFN CSN1 Roma,
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.
Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015.
Computing Jiří Chudoba Institute of Physics, CAS.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
Lofar Information System on GRID A.N.Belikov. Lofar Long Term Archive Prototypes: EGEE Astro-WISE Requirements to data storage Tiers Astro-WISE adaptation.
DCAF(DeCentralized Analysis Farm) for CDF experiments HAN DaeHee*, KWON Kihwan, OH Youngdo, CHO Kihyeon, KONG Dae Jung, KIM Minsuk, KIM Jieun, MIAN shabeer,
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
1.3 ON ENHANCING GridFTP AND GPFS PERFORMANCES A. Cavalli, C. Ciocca, L. dell’Agnello, T. Ferrari, D. Gregori, B. Martelli, A. Prosperini, P. Ricci, E.
Maria Girone, CERN CMS Experiment Status, Run II Plans, & Federated Requirements Maria Girone, CERN XrootD Workshop, January 27, 2015.
January 20, 2000K. Sliwa/ Tufts University DOE/NSF ATLAS Review 1 SIMULATION OF DAILY ACTIVITITIES AT REGIONAL CENTERS MONARC Collaboration Alexander Nazarenko.
A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE1 University of Wisconsin-Madison CMS Tier-2 Site Report D. Bradley, S. Dasu, A. Mohapatra, T. Sarangi, C. Vuosalo.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
1 The S.Co.P.E. Project and its model of procurement G. Russo, University of Naples Prof. Guido Russo.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do?
1 1 – Statistical information about our resource centers; ; 2 – Basic infrastructure of the Tier-1 & 2 centers; 3 – Some words about the future.
PRIN STOA-LHC: STATUS BARI BOLOGNA-18 GIUGNO 2014 Giorgia MINIELLO G. MAGGI, G. DONVITO, D. Elia INFN Sezione di Bari e Dipartimento Interateneo.
Dynamic Extension of the INFN Tier-1 on external resources
Extending the farm to external sites: the INFN Tier-1 experience
Workload Management Workpackage
Distributed storage, work status
SuperB – INFN-Bari Giacinto DONVITO.
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
Belle II Physics Analysis Center at TIFR
Andrea Chierici On behalf of INFN-T1 staff
CERN Data Centre ‘Building 513 on the Meyrin Site’
Luca dell’Agnello INFN-CNAF
Simulation use cases for T2 in ALICE
Ákos Frohner EGEE'08 September 2008
The INFN Tier-1 Storage Implementation
University of Technology
The LHCb Computing Data Challenge DC06
Presentation transcript:

INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities The Tier2 has to fulfill on one side a series of activities mandatory in the Memorandum of Understanding of CMS, on the other side support as much as possible Italian Physicists doing research Typical activities MC prod (MoU): ~1000 cores, 50 TB temporary space Analysis via GRID: ~1000 cores, 800 TB Local analysis: ~50 users LSF ~ 500 cores, ~ 100 TB local space Interactive work: ~50 users, few cores each (mostly multi threaded applications) There are three ingredients to a successful large scale data center: storage, CPUs, infrastructure INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities The Tier2 has to fulfill on one side a series of activities mandatory in the Memorandum of Understanding of CMS, on the other side support as much as possible Italian Physicists doing research Typical activities MC prod (MoU): ~1000 cores, 50 TB temporary space Analysis via GRID: ~1000 cores, 800 TB Local analysis: ~50 users LSF ~ 500 cores, ~ 100 TB local space Interactive work: ~50 users, few cores each (mostly multi threaded applications) There are three ingredients to a successful large scale data center: storage, CPUs, infrastructure Storage A center used for diverse activities must offer a solid storage infrastructure, accessible through all the protocols needed by the use cases SRM for WAN organized transfers (we use Storm, an italian SRM implementation, and dCache for legacy files) At least one protocol for locally running GRID jobs (we offer dCap, POSIX, XrootD) At least one protocol for WAN direct (streaming) access (we offer XrootD both from dCache and Storm) Interactive users in the end always prefer POSIX direct access (we offer that on Storm) Storage A center used for diverse activities must offer a solid storage infrastructure, accessible through all the protocols needed by the use cases SRM for WAN organized transfers (we use Storm, an italian SRM implementation, and dCache for legacy files) At least one protocol for locally running GRID jobs (we offer dCap, POSIX, XrootD) At least one protocol for WAN direct (streaming) access (we offer XrootD both from dCache and Storm) Interactive users in the end always prefer POSIX direct access (we offer that on Storm) 2 DataDirect DDN9900 systems, with 300 disks each ( >1 PB raw storage) Total of 4x8 8 Gbit/s FC connections, served to GPFS NSD via a FC Switch 8 GPFS NSD with MultiPath enabled, 10 Gbits connectivity GPFS version 3.4.0, serving more than 1 PB DDN EF3015 storage dedicated to Metadata handling DataDirect 12k on arrival Throughput from storage measured in excess of 32 Gbit/s 2 DataDirect DDN9900 systems, with 300 disks each ( >1 PB raw storage) Total of 4x8 8 Gbit/s FC connections, served to GPFS NSD via a FC Switch 8 GPFS NSD with MultiPath enabled, 10 Gbits connectivity GPFS version 3.4.0, serving more than 1 PB DDN EF3015 storage dedicated to Metadata handling DataDirect 12k on arrival Throughput from storage measured in excess of 32 Gbit/s CPU The GRID data center in Pisa (~5000 cores) is engineered to make sure that the biggest number of tasks is running at any time, which means No jobs are guaranteed to run instantaneously Every user/group of users can in principle saturate the whole farm We are happy to host GRID jobs from users from experiments not present in Pisa No restriction is imposed to jobs when the farm is not full In case of multiple queued jobs, a fairshare is implemented via LSF which matches the share of resources Only case in which some resources are left unused is a small part of the cluster (~500 cores) which can be preempted by parallel jobs CPU The GRID data center in Pisa (~5000 cores) is engineered to make sure that the biggest number of tasks is running at any time, which means No jobs are guaranteed to run instantaneously Every user/group of users can in principle saturate the whole farm We are happy to host GRID jobs from users from experiments not present in Pisa No restriction is imposed to jobs when the farm is not full In case of multiple queued jobs, a fairshare is implemented via LSF which matches the share of resources Only case in which some resources are left unused is a small part of the cluster (~500 cores) which can be preempted by parallel jobs Users need specific machines to carry on their research work Possibility to submit to the GRID Possibility to use the local farm Possibility to run interactively a process on a single/multi cores with guarantees on RAM availability We decided to have a number of machines available, using the batch system as a load balancer. The batch system directs you to a machine where you have the # of cores and memory you asked for, and makes sure no other task is interfering. When the interactive load is low, the same machines can be used as standard GRID processing nodes Users need specific machines to carry on their research work Possibility to submit to the GRID Possibility to use the local farm Possibility to run interactively a process on a single/multi cores with guarantees on RAM availability We decided to have a number of machines available, using the batch system as a load balancer. The batch system directs you to a machine where you have the # of cores and memory you asked for, and makes sure no other task is interfering. When the interactive load is low, the same machines can be used as standard GRID processing nodes Infrastructure The Pisa Data Center largely exceeds the typical dimension of a Department’s computing center. As such, it is equipped with enterprise level infrastructure Rack space: the computing room can host up to 34 full height standard racks. Networking: we use a flat switching matrix, implemented via a Force10 E1200 fabric. It allows for a point-to-point 1Gbit/s (or 10Gbit/s) connection between all the machines. We use 1Gbit/s on all the Computing nodes, and 10Gbit/s on all the storage nodes. Electrical distribution: we are served with a 300 KVA connection. All the services, and part of the Computing nodes, are served by UPS. Cooling: the computing room is served by 12 APC InRow air sources, cooled by 3 Emerson Chillers on the roof (300 kW refrigerating capacity) Infrastructure The Pisa Data Center largely exceeds the typical dimension of a Department’s computing center. As such, it is equipped with enterprise level infrastructure Rack space: the computing room can host up to 34 full height standard racks. Networking: we use a flat switching matrix, implemented via a Force10 E1200 fabric. It allows for a point-to-point 1Gbit/s (or 10Gbit/s) connection between all the machines. We use 1Gbit/s on all the Computing nodes, and 10Gbit/s on all the storage nodes. Electrical distribution: we are served with a 300 KVA connection. All the services, and part of the Computing nodes, are served by UPS. Cooling: the computing room is served by 12 APC InRow air sources, cooled by 3 Emerson Chillers on the roof (300 kW refrigerating capacity) Force10 E1200 APC InRow Request 4 interactive cores in the UI cluster We had to develop a number of Monitoring tools to make sure the site is operational. These complement the standard CMS and WLCG existing tools. LSFMon Netvibes widgets to monitor dCache, LSF Infrastructure status Partially developed under PRIN 2008MHENNA MIUR Project

Optimization of HEP Analysis activities using a Tier2 Infrastructure authors PRIN statement Another Site WAN Transfers SRM XrootD Disk CPU PISA dCap Posix XrootD LAN Transfers Remote UI PISA CPU Disk Remote file access via streaming GRID job submission Local job submission Local file access Local UI cluster Frontend machine LSF Job overflow to UIs

Another Site WAN Transfers SRM XrootD Disk CPU PISA dCap Posix XrootD LAN Transfers Remote UI PISA CPU Disk Remote file access via streaming GRID job submission Local job submission Local file access Local UI cluster Frontend machine LSF Job overflow to UIs