IT-DSS Alberto Pace2 ? Detecting particles (experiments) Accelerating particle beams Large-scale computing (Analysis) Discovery We are here The mission.

Slides:



Advertisements
Similar presentations
Jens G Jensen Atlas Petabyte store Supporting Multiple Interfaces to Mass Storage Providing Tape and Mass Storage to Diverse Scientific Communities.
Advertisements

Click to edit Master title style European AFS and Kerberos Conference Welcome to CERN CERN, Accelerating Science and Innovation CERN, Accelerating Science.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
1 Storage Today Victor Hatridge – CIO Nashville Electric Service (615)
Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
Lecture 3: A Case for RAID (Part 1) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.
Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.
DATA PRESERVATION IN ALICE FEDERICO CARMINATI. MOTIVATION ALICE is a 150 M CHF investment by a large scientific community The ALICE data is unique and.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
New Challenges in Cloud Datacenter Monitoring and Management
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
CS4432: Database Systems II Data Storage (Better Block Organization) 1.
1© Copyright 2013 EMC Corporation. All rights reserved. EMC and Microsoft SharePoint Server Data Protection Name Title Date.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Introduction to data management.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS CERN and Computing … … and Storage Alberto Pace Head, Data.
Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Introduction to data management.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.
Implementation Review1 Deriving Architecture Requirements March 14, 2003.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS From data management to storage services to the next challenges.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
Data Replication and Power Consumption in Data Grids Susan V. Vrbsky, Ming Lei, Karl Smith and Jeff Byrd Department of Computer Science The University.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Virtual visit to the computer Centre Photo credit to Alexey.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Dr Ian Bird LCG Project Leader 6 th October 2009 Telecom 2009 Youth Forum.
ESFRI & e-Infrastructure Collaborations, EGEE’09 Krzysztof Wrona September 21 st, 2009 European XFEL.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
CERN – IT Department CH-1211 Genève 23 Switzerland t Working with Large Data Sets Tim Smith CERN/IT Open Access and Research Data Session.
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
WLCG and the India-CERN Collaboration David Collados CERN - Information technology 27 February 2014.
Future home directories at CERN
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow UK-T0 Meeting 21 st Oct 2015 GridPP.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Architecture & Cybersecurity – Module 3 ELO-100Identify the features of virtualization. (Figure 3) ELO-060Identify the different components of a cloud.
LHC Computing, CERN, & Federated Identities
tons, 150 million sensors generating data 40 millions times per second producing 1 petabyte per second The ATLAS experiment.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Introduction to data management Alberto Pace.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Data architecture challenges for CERN and the High Energy.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Joint Institute for Nuclear Research Synthesis of the simulation and monitoring processes for the data storage and big data processing development in physical.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
© 2014 VMware Inc. All rights reserved. Cloud Archive for vCloud ® Air™ High-level Overview August, 2015 Date.
1 Particle Physics Data Grid (PPDG) project Les Cottrell – SLAC Presented at the NGI workshop, Berkeley, 7/21/99.
Peter Idoine Managing Director Oracle New Zealand Limited.
St. Petersburg, 2016 Openstack Disk Storage vs Amazon Disk Storage Computing Clusters, Grids and Cloud Erasmus Mundus Master Program in PERCCOM Author:
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Support to scientific.
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
CERN IT-Storage Strategy Outlook Alberto Pace, Luca Mascetti, Julien Leduc
The LHC Computing Grid Visit of Mtro. Enrique Agüera Ibañez
Managing Storage in a (large) Grid data center
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
Thoughts on Computing Upgrade Activities
Ákos Frohner EGEE'08 September 2008
Zendos Tecnologia Utilizes the Powerful, Scalable
The LHC Computing Grid Visit of Professor Andreas Demetriou
Presentation transcript:

IT-DSS Alberto Pace2 ? Detecting particles (experiments) Accelerating particle beams Large-scale computing (Analysis) Discovery We are here The mission of CERN

IT-DSS Alberto Pace3 The need for computing in research Scientific research in recent years has exploded the computing requirements Computing has been the strategy to reduce the cost of traditional research Computing has opened new horizons of research not only in High Energy Physics At constant cost, exponential growth of performances Return in computing investment higher than other fields: Budget available for computing increased, growth is more than exponential

IT-DSS Alberto Pace4 The need for storage in computing Scientific computing for large experiments is typically based on a distributed infrastructure Storage is one of the main pillars Storage requires Data Management… DATA CPU NET Scientific Computing

IT-DSS Alberto Pace5 Roles Storage Services Three main roles Storage (store the data) Distribution (ensure that data is accessible) Preservation (ensure that data is not lost) Tier-2, Tier-3 Tier-1 Tier-0 (LHC Experiments) Data production, Storage, data distribution, data preservation Storage, data distribution Size in PB + performance Availability Reliability

IT-DSS Alberto Pace6 “Why” data management ? Data Management solves the following problems Data reliability Access control Data distribution Data archives, history, long term preservation In general: Empower the implementation of a workflow for data processing

The Worldwide LHC Computing Grid 7000 tons, 150 million sensors generating data 40 millions times per second i.e. a petabyte/s The ATLAS experiment 7

Alberto Pace, slide 8 At CERN: Acquisition, First pass reconstruction, Storage, Distribution, and Data Preservation 1 GB/sec 4 GB/sec 200 MB/sec

Alberto Pace, slide 9 CERN Computing Infrastructure NetworkCPUsStorage Infrastructur e Databases 11:05

IT-DSS Alberto Pace10 Dataflows for Science Storage in scientific computing is distributed across multiple data centres (Tiers) Data flows from the experiments to all datacenters where there is CPU available to process the data Tier-0 (LHC Experiments) Tier-2 Tier-1

IT-DSS Alberto Pace11

IT-DSS Alberto Pace12 Computing Services for Science Whenever a site has … idle CPUs (because no Data is available to process) or excess of Data (because there is no CPU left for analysis) … the efficiency drops

IT-DSS Alberto Pace13 Why storage is complex ? Analysis made with high efficiency requires the data to be pre-located to where the CPUs are available Or to allow peer-to peer data transfer This allows sits with excess of CPU, to schedule the pre-fetching of data when missing locally or to access it remotely if the analysis application has been designed to cope with high latency Tier-0 (LHC Experiments) Tier-2 Tier-1

IT-DSS Alberto Pace14 Why storage is complex ? Both approaches coexists Data is pre-placed It is the role of the experiments that plans the analysis Data globally accessible and federated in a global namespace –the middleware is used for access always attempt to take the local data or redirects to the nearest remote copy jobs designed to minimize the impact of the additional latency that the redirection requires

IT-DSS Alberto Pace15 Which storage model ? A simple storage model: all data into the same storage Uniform, simple, easy to manage, no need to move data Can provide sufficient level of performance and reliability “Cloud” Storage

IT-DSS Alberto Pace16 … but some limitations Different storage solutions can offer different quality of services Three parameters: Performance, Reliability, Cost You can have two but not three To deliver both performance and reliability you must deploy expensive solutions Ok for small sites (you save because you have a simple infrastructure) Difficult to justify for large sites Slow Expensive Unreliable TapesDisks Flash, Solid State Disks Mirrored disks

IT-DSS Alberto Pace17 Two building blocks to empower data processing Data pools with different quality of services Tools for data transfer between pools So, … what is data management ? Examples from LHC experiment data models

IT-DSS Alberto Pace18 Storage services Storage need to be able to adapt to the changing requirement of the experiments Cover the whole area of the triangle and beyond Slow Expensive Unreliable

IT-DSS Alberto Pace19 Why multiple pools and quality ? Derived data used for analysis and accessed by thousands of nodes Need high performance, Low cost, minimal reliability (derived data can be recalculated) Raw data that need to be analyzed Need high performance, High reliability, can be expensive (small sizes) Raw data that has been analyzed and archived Must be low cost (huge volumes), High reliability (must be preserved), performance not necessary

IT-DSS Alberto Pace20 Data pools Different quality of services Three parameters: (Performance, Reliability, Cost) You can have two but not three Slow Expensive Unreliable TapesDisks Flash, Solid State Disks Mirrored disks

IT-DSS Alberto Pace21 … and the balance is not simple Many ways to split (performance, reliability, cost) Performance has many sub-parameters Cost has many sub-parameters Reliability has many sub-parameters Reliability Performance Latency / Throughput ScalabilityElectrical consumption HW cost Ops Cost (manpower) Consistency Cost

IT-DSS Alberto Pace22 … and reality is complicated Key requirements: Simple, Scalable, Consistent, Reliable, Available, Manageable, Flexible, Performing, Cheap, Secure. Aiming for on-demand “quality of service” And what about is scalability ?

IT-DSS Alberto Pace23 Tools needed Needed: Tools to transfer data effectively across pools of different quality Storage elements that can be reconfigured “on the fly” to meet new requirements without moving the data Examples Moving petabytes of data from a multiuser disk pool into a reliable tape back-end Increasing the replica factor to 5 on a pool containing condition data requiring access form thousands of simultaneous users Deploying petabytes of additional storage in few days

IT-DSS Alberto Pace24 Roles Storage Services Three main roles Storage (store the data) Distribution (ensure that data is accessible) Preservation (ensure that data is not lost) Tier-2, Tier-3 Tier-1 Tier-0 (LHC Experiments) Data production, Storage, data distribution, data preservation Storage, data distribution Size in PB + performance Availability Reliability

IT-DSS Alberto Pace25 Multi site transfers (200 Gbps) p(A and B) = p(A) x p(B/A)

IT-DSS Alberto Pace26 Reliability … You can achieve high reliability using standard disk pools Multiple replicas Erasure codes (beware of independence of failures) Here is where tapes can play a role Tapes ??

IT-DSS Alberto Pace27 Do we need tapes ? Tapes have a bad reputation in some use cases Slow in random access mode high latency in mounting process and when seeking data (F-FWD, REW) Inefficient for small files (in some cases) Comparable cost per (peta)byte as hard disks

IT-DSS Alberto Pace28 Do we need tapes ? Tapes have a bad reputation in some use cases Slow in random access mode high latency in mounting process and when seeking data (F-FWD, REW) Inefficient for small files (in some cases) Comparable cost per (peta)byte as hard disks Tapes have also some advantages Fast in sequential access mode > 2x faster than disk, with physical read after write verification (4x faster) Several orders of magnitude more reliable than disks Few hundreds GB loss per year on 80 PB raw tape repository Few hundreds TB loss per year on 50 PB raw disk repository No power required to preserve the data Less physical volume required per (peta)byte Inefficiency for small files issue resolved by recent developments Cannot be deleted in few minutes Bottom line: if not used for random access, tapes have a clear role in the architecture

IT-DSS Alberto Pace29 Large scale media migration …

IT-DSS Alberto Pace30 Summary Data Management solves the following problems Data reliability Access control Data distribution Data archives, history, long term preservation In general: Empower the implementation of a workflow for data processing