Ian Collier, STFC, Romain Wartel, CERN Maintaining Traceability in an Evolving Distributed Computing Environment Introduction Security.

Slides:

Advertisements

Similar presentations

HEPiX Virtualisation Working Group Status, July 9 th 2010

Advertisements

Cloud & Virtualisation Update at the RAL Tier 1 Ian Collier Andrew Lahiff STFC RAL Tier 1 HEPiX, Lincoln, NEBRASKA, 17 th October 2014.

DoD Information Technology Security Certification and Accreditation Process (DITSCAP) Phase III – Validation Thomas Howard Chris Pierce.

WLCG Cloud Traceability Working Group progress Ian Collier Pre-GDB Amsterdam 10th March 2015.

© Copyright Eliyahu Brutman Programming Techniques Course.

WLCG Security TEG, risks and Identity Management David Kelsey GridPP28, Manchester 18 Apr 2012.

Tier-1 experience with provisioning virtualised worker nodes on demand Andrew Lahiff, Ian Collier STFC Rutherford Appleton Laboratory, Harwell Oxford,

Testing as a Service with HammerCloud Ramón Medrano Llamas CERN, IT-SDC

Website Hardening HUIT IT Security | Sep

EGI-Engage Recent Experiences in Operational Security: Incident prevention and incident handling in the EGI and WLCG infrastructure.

Assessment of Core Services provided to USLHC by OSG.

 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.

Mantychore Oct 2010 WP 7 Andrew Mackarel. Agenda 1. Scope of the WP 2. Mm distribution 3. The WP plan 4. Objectives 5. Deliverables 6. Deadlines 7. Partners.

Demystifying the Business Analysis Body of Knowledge Central Iowa IIBA Chapter December 7, 2005.

David Groep Nikhef Amsterdam PDP & Grid Traceability in the face of Clouds EGI-GEANT Symposium – cloud security track With grateful thanks for the input.

EGI-Engage Recent Experiences in Operational Security: Incident prevention and incident handling in the EGI and WLCG infrastructure.

INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.

European Grid Initiative Federated Cloud update Peter solagna Pre-GDB Workshop 10/11/

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Federated Cloud F2F Security Issues in the cloud Introduction Linda Cornwall,

WLCG Cloud Traceability Working Group face to face report Ian Collier 11 February 2015.

EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks David Kelsey RAL/STFC,

Virtual Workspaces Kate Keahey Argonne National Laboratory.

Virtualised Worker Nodes Where are we? What next? Tony Cass GDB /12/12.

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Federated Cloud Security - what is needed Linda Cornwall (STFC) and the.

EGI-Engage Recent Experiences in Operational Security: Incident prevention and incident handling in the EGI and WLCG infrastructure.

GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.

Trusted Virtual Machine Images a step towards Cloud Computing for HEP? Tony Cass on behalf of the HEPiX Virtualisation Working Group October 19 th 2010.

Security Policy Update David Kelsey UK HEP Sysman, RAL 1 Jul 2011.

Security Vulnerabilities in A Virtual Environment

DTI Mission – 29 June LCG Security Ian Neilson LCG Security Officer Grid Deployment Group CERN.

Security Policy: From EGEE to EGI David Kelsey (STFC-RAL) 21 Sep 2009 EGEE’09, Barcelona david.kelsey at stfc.ac.uk.

Workload management, virtualisation, clouds & multicore Andrew Lahiff.

Security Policy Update WLCG GDB CERN, 14 May 2008 David Kelsey STFC/RAL

DIRAC Pilot Jobs A. Casajus, R. Graciani, A. Tsaregorodtsev for the LHCb DIRAC team Pilot Framework and the DIRAC WMS DIRAC Workload Management System.

Evolving Security in WLCG Ian Collier, STFC Rutherford Appleton Laboratory Group info (if required) 1 st February 2016, WLCG Workshop Lisbon.

CERN - IT Department CH-1211 Genève 23 Switzerland t A Quick Overview of ITIL John Shade CERN WLCG Collaboration Workshop April 2008.

JSPG Update David Kelsey MWSG, Zurich 31 Mar 2009.

EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team Kickoff Meeting.

1 Cloud Services Requirements and Challenges of Large International User Groups Laurence Field IT/SDC 2/12/2014.

HEPiX Virtualisation Working Group Status, February 10 th 2010 April 21 st 2010 May 12 th 2010.

Why a Commercial Provider should Join the Academic Cloud Federation David Blundell Managing Director 100 Percent IT Ltd Simple, Flexible, Reliable.

3rd Helix Nebula Workshop on Interoperability among e-Infrastructures and Commercial Clouds Carmela ASERO, EGI.eu 17 September 2013, Madrid

Enterprise Architectures. Core Concepts Key Learning Points: This chapter will help you to answer the following questions: What are the ADM phase names.

Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES Improving resilience of T0 grid services Manuel Guijarro.

STFC in INDIGO DataCloud WP3 INDIGO DataCloud Kickoff Meeting Bologna April 2015 Ian Collier

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Federated Cloud and Software Vulnerabilities Linda Cornwall, STFC 20.

Traceability WLCG GDB Amsterdam, 7 March 2016 David Kelsey STFC/RAL.

LECTURE 5 Nangwonvuma M/ Byansi D. Components, interfaces and integration Infrastructure, Middleware and Platforms Techniques – Data warehouses, extending.

Instituto de Biocomputación y Física de Sistemas Complejos Cloud resources and BIFI activities in JRA2 Reunión JRU Española.

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Questionnaires to Cloud technology providers and sites Linda Cornwall, STFC,

Trusted Virtual Machine Images the HEPiX Point of View Tony Cass October 21 st 2011.

Logging and Monitoring. Motivation Attacks are common (see David's talk) – Sophisticated – hard to reveal, (still) quite limited in our environment –

WP5 – Infrastructure Operations Test and Production Infrastructures StratusLab kick-off meeting June 2010, Orsay, France GRNET.

Why you should care about glexec OSG Site Administrator’s Meeting Written by Igor Sfiligoi Presented by Alain Roy Hint: It’s about security.

HEPiX Virtualisation working group Andrea Chierici INFN-CNAF Workshop CCR 2010.

Cloud Security Session: Introduction 25 Sep 2014Cloud Security, Kelsey1 David Kelsey (STFC-RAL) EGI-Geant Symposium Amsterdam 25 Sep 2014.

EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.

Ian Bird, CERN WLCG Project Leader Amsterdam, 24 th January 2012.

Bob Jones EGEE Technical Director

Accessing the VI-SEEM infrastructure

A Quick Overview of ITIL

HEPiX Virtualisation working group

Dag Toppe Larsen UiB/CERN CERN,

Dag Toppe Larsen UiB/CERN CERN,

Ian Bird GDB Meeting CERN 9 September 2003

ServiceNow Implementation Knowledge Management

Systems Analysis and Design

FCT Follow-up Meeting 31 March, 2017 Fernando Meireles

WLCG Collaboration Workshop;

Presentation transcript:

Ian Collier, STFC, Romain Wartel, CERN Maintaining Traceability in an Evolving Distributed Computing Environment Introduction Security incidents are an operational reality on our grid based distributed computing infrastructure. We are used to managing the risks inherent in running distributed computing infrastructures. When things go wrong we need, at a minimum, to know: WHO did WHAT, WHEN they did it, and WHERE they did it. This allows us to contain the impact of incidents, preserve reputations and ensure that resources are available for their intended purposes. As our infrastructure evolves to include cloud resources we must ensure we maintain the traceability we depend upon. Current traceability model We have well developed incident response procedures. Assumes sites have sight, and control, of the execution environments. Sites log, in detail, from the execution environment (worker nodes) and from CEs, batch systems etc. to central loggers. More or less sophisticated tools (often variants of grep) to search Granular authorisation & traceability in multi user pilot jobs with glexec (although this is not implemented universally). We have developed and agreed incident response procedures with clearly identify contact points and established trust relationships Facilitates both the analysis of and response to problematic activities. Solutions - Logging Increase focus on externally observable behaviour Hypervisor & Cloud management framework Federated resource frameworks Network activity & flows (neglected until now) Also connect VMs to central loggers at sites - requires standardised hooks in VM images Aggregation of and cross checking between multiple sources is vital Improved tools for storing, aggregating & searching increasingly important Conclusions & Future work WLCG Cloud Traceability Working Group established to carry out practical development of possible solutions. Use the results from all these areas of investigation to develop: Updated policies setting out requirements for running these new forms of distributed computing infrastructures without compromising traceability - perhaps even improving it. Best practise recommendations for how to gather additional logging information and how to configure management frameworks and VM images. While this work is focussed in the already well developed WLCG collaboration the policy and best practise we produce can provide a model for emerging cloud & virtualisation based distributed computing infrastructures. Open question how to develop trust frameworks that will allow sites to accept VOs as full partners in incident response. Perhaps not unlike work that allowed workloads to be distributed across grids. 2.INDIGO-DataCloud architecture diagram Emerging clouds Private, public and federated cloud resources bring changes to many aspects of distributed computing. Changes to workflows: Sometimes removing complexity for users (that is the aim). Changing things for providers - some things are easier. Clouds also introduce new software components (cloud management frameworks of varying complexity) and (complex) new frameworks for federating cloud resources are being actively developed. Many new ways for things to go wrong. Diagrams 1 & 2 show the Openstack and INDIGO-Datacloud (one emerging federated cloud environment) architectures illustrating just some of the many new complex interactions. Sites no longer have the same control of the execution environment. cf public cloud providers who ‘don’t care what goes on inside’. VMs launched by VOs – or by their workload management systems. Solutions - Quarantining Images Virtualisation allows us to capture VM images for forensic examination – a big advantage. What if a hypothetical attacker deliberately uses short lived VMs? Need images to be retained for a, tunable, period after shutdown Some cloud platforms already do this Implemention required for others - OpenNebula & OpenStack using Ceph are common combinations which need this. Solutions - VOs as partners in Incident Response For some grid jobs we already need to go to VOs to find out what user ran specific jobs. (So that we can suspend just that user.) Within WLCG VOs already log workflows to support debugging & workload management. Could changes to VO logging better support traceability? Rather than attempt an up front gap analysis, traceability service challenges can be used to identify limits of current logging and suggest enhancements. payloads and challenge methodologies are currently in development This is an opportunity to formally recognise the existing reality that we need the active participation of VOs in order to maintain traceability. 1. OpenStack Folsom architecture diagram