VMs at a Tier-1 site EGEE’09, 21-09-2009 Sander Klous, Nikhef.

Slides:



Advertisements
Similar presentations
Overview of local security issues in Campus Grid environments Bruce Beckles University of Cambridge Computing Service.
Advertisements

“It’s going to take a month to get a proof of concept going.” “I know VMM, but don’t know how it works with SPF and the Portal” “I know Azure, but.
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
Minerva Infrastructure Meeting – October 04, 2011.
1 Integrating a Network IDS into an Open Source Cloud Computing Environment 1st International Workshop on Security and Performance in Emerging Distributed.
EGI-Engage Recent Experiences in Operational Security: Incident prevention and incident handling in the EGI and WLCG infrastructure.
Assessment of Core Services provided to USLHC by OSG.
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 7 2/23/2015.
1 Advanced Storage Technologies for High Performance Computing Sorin, Faibish EMC NAS Senior Technologist IDC HPC User Forum, April 14-16, Norfolk, VA.
TeraGrid Science Gateways: Scaling TeraGrid Access Aaron Shelmire¹, Jim Basney², Jim Marsteller¹, Von Welch²,
Presented by: Sanketh Beerabbi University of Central Florida COP Cloud Computing.
1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks C. Loomis (CNRS/LAL) M.-E. Bégin (SixSq.
Large Scale Sky Computing Applications with Nimbus Pierre Riteau Université de Rennes 1, IRISA INRIA Rennes – Bretagne Atlantique Rennes, France
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
Responsibilities of ROC and CIC in EGEE infrastructure A.Kryukov, SINP MSU, CIC Manager Yu.Lazin, IHEP, ROC Manager
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks David Kelsey RAL/STFC,
Virtual Workspaces Kate Keahey Argonne National Laboratory.
Mine Altunay July 30, 2007 Security and Privacy in OSG.
OSG Tier 3 support Marco Mambelli - OSG Tier 3 Dan Fraser - OSG Tier 3 liaison Tanya Levshina - OSG.
EGEE-II INFSO-RI Enabling Grids for E-sciencE The GILDA training infrastructure.
EVGM081 Multi-Site Virtual Cluster: A User-Oriented, Distributed Deployment and Management Mechanism for Grid Computing Environments Takahiro Hirofuchi,
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
GCRC Meeting 2004 BIRN Coordinating Center Software Development Vicky Rowley.
Trusted Virtual Machine Images a step towards Cloud Computing for HEP? Tony Cass on behalf of the HEPiX Virtualisation Working Group October 19 th 2010.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
Rutherford Appleton Lab, UK VOBox Considerations from GridPP. GridPP DTeam Meeting. Wed Sep 13 th 2005.
Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Tools and techniques for managing virtual machine images Andreas.
DTI Mission – 29 June LCG Security Ian Neilson LCG Security Officer Grid Deployment Group CERN.
Security Policy: From EGEE to EGI David Kelsey (STFC-RAL) 21 Sep 2009 EGEE’09, Barcelona david.kelsey at stfc.ac.uk.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
Chapter 3 Pre-Incident Preparation Spring Incident Response & Computer Forensics.
INFSO-RI SA2 ETICS2 first Review Valerio Venturi INFN Bruxelles, 3 April 2009 Infrastructure Support.
INRNE's participation in LCG Elena Puncheva Preslav Konstantinov IT Department.
OpenNebula: Experience at SZTAKI Peter Kacsuk, Sandor Acs, Mark Gergely, Jozsef Kovacs MTA SZTAKI EGI CF Helsinki.
Ian Collier, STFC, Romain Wartel, CERN Maintaining Traceability in an Evolving Distributed Computing Environment Introduction Security.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
StratusLab is co-funded by the European Community’s Seventh Framework Programme (Capacities) Grant Agreement INFSO-RI Demonstration StratusLab First.
Instituto de Biocomputación y Física de Sistemas Complejos Cloud resources and BIFI activities in JRA2 Reunión JRU Española.
INFN/IGI contributions Federated Clouds Task Force F2F meeting November 24, 2011, Amsterdam.
VO Box discussion ATLAS NIKHEF January, 2006 Miguel Branco -
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th A proposal for distributed computing monitoring for SuperB G.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Questionnaires to Cloud technology providers and sites Linda Cornwall, STFC,
Trusted Virtual Machine Images the HEPiX Point of View Tony Cass October 21 st 2011.
WP5 – Infrastructure Operations Test and Production Infrastructures StratusLab kick-off meeting June 2010, Orsay, France GRNET.
Grid Deployment Technical Working Groups: Middleware selection AAA,security Resource scheduling Operations User Support GDB Grid Deployment Resource planning,
INFSO-RI Enabling Grids for E-sciencE Grid & Cloud Computing Introduction
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Virtual Machines on BiG Grid INFN Annual Meeting May 2010 Sander Klous, Nikhef.
Bob Jones EGEE Technical Director
Accessing the VI-SEEM infrastructure
New Paradigms: Clouds, Virtualization and Co.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
C Loomis (CNRS/LAL) and V. Floros (GRNET)
Regional Operations Centres Core infrastructure Centres
StratusLab First Periodic Review
Cloud Challenges C. Loomis (CNRS/LAL) EGI-TF (Amsterdam)
Dag Toppe Larsen UiB/CERN CERN,
Dag Toppe Larsen UiB/CERN CERN,
StratusLab Final Periodic Review
StratusLab Final Periodic Review
Hybrid Cloud Architecture for Software-as-a-Service Provider to Achieve Higher Privacy and Decrease Securiity Concerns about Cloud Computing P. Reinhold.
Grid Computing.
WLCG Collaboration Workshop;
Leigh Grundhoefer Indiana University
Presentation transcript:

VMs at a Tier-1 site EGEE’09, Sander Klous, Nikhef

Contents Introduction –Who are we? Motivation –Why are we interested in VMs? –What are we going to do with VMs? Status –How do we approach this issue? –Where do we stand? Challenges BIG Grid - Virtualization working group 2

Introduction Collaboration between –NCF: national computing facilities –Nikhef: national institute for subatomic physics –NBIC: national bioinformatics center Participation from Philips, SARA, etc. Goal: “Enables access to grid infrastructures for scientific research in the Netherlands” BIG Grid - Virtualization working group 3

Motivation: Why Virtual Machines? Site perspective –Resource flexibility (e.g. SL4 / SL5) –Resource management Scheduling / multi-core / sandboxing User perspective –Isolation from environment Identical environment on multiple sites Identical environment on local machine BIG Grid - Virtualization working group 4

Different VM classes Class 1: Site generated Virtual Machines –No additional trust issues –Benefits for system administration Class 2: Certified Virtual Machines –Inspection and certification to establish trust –Requirements for monitoring / integration Class 3: User generated Virtual Machines –No trust relation –Requires appropriate security measures BIG Grid - Virtualization working group 5

Resource management Site infrastructure Typical use case Class 1 VM Torque/PBS Box 2 “8 Virtual SL4 WNs” Box 3 “8 Virtual SL5 WNs” Virtual Machine Manager Job queue VM queue Box 1 “Normal WN” BIG Grid - Virtualization working group6

Typical use case Class 2 VM Analysis on Virtual Machines Run minimal analysis on desktop/laptop –Access to grid services Run full analysis on the grid –Identical environment –Identical access to grid services No interest to become system administrator –Standard experiment software is sufficient BIG Grid - Virtualization working group 7

Typical use case Class 3 VM Identification and classification of GPCRs Requires very specific software set –Blast –HMMER –BioPython1.50 Even non-x86 (binary) applications! Specific software for this user No common experiment software BIG Grid - Virtualization working group 8

Project status Working group: virtualization of worker nodes Kick-off meeting July 6 th 2009 –System administrators, User support, management Phase 1 (3 months) –Collect site and user requirements –Identify other ongoing efforts in Europe –First design Phase 2 (3 months) –Design and implement proof of concept BIG Grid - Virtualization working group 9

Active working group topics Policies/Security issues for Class 2/3 VMs Technology study –Managing Virtual Machines –Distributing VM images –Interfacing the VM infrastructure with ‘the grid’ Identify missing functionality and alternatives –Accounting and fare share, image management, authentication/authorization, etc BIG Grid - Virtualization working group 10

The Amazon identity crisis The three most confronting questions: 1.What is the difference between a job and a VM? 2.Why can I do it at Amazon, but not at the grid? 3.What is the added value of grids over clouds? “We don’t want to compete with Amazon!” BIG Grid - Virtualization working group 11

Policy and security issues E-science services and functionality Data integrity, confidentiality and privacy Non-repudiation of user actions System administrator point of view Trust user intentions, not their implementations Incident response more costly than certification Forensics is time consuming BIG Grid - Virtualization working group 12

Compromised user space is often already enough trouble Security 101 = Attack surface BIG Grid - Virtualization working group13

Available policies Grid Security Policy, version 5.7a VO Portal Policy, version 1.0 (draft) Big Grid Security Policy, version –Grid Acceptable Use Policy, version 3.1 –Grid Site Operations Policy, version 1.4a –LCG/EGEE Incident Handling and Response Guide, version 2.1 –Grid Security Traceability and Logging Policy, version 2.0 VO-Box Security Recommendations and Questionnaire, version 0.6 (draft, not ratified) BIG Grid - Virtualization working group 14

Relevant policy statements Network security is covered by site local security policies and practices A VO Box is part of the trusted network fabric. Privileged access is limited to resource administrators Software deployed in the grid must include sufficient and relevant site central logging BIG Grid - Virtualization working group 15

First compromise Certified package repository –Base templates –Certified packages Separate user disk –User specific stuff –Permanent storage At run time –No privileged access –Comparable to VO box BIG Grid - Virtualization working group 16 Licenses?

Second compromise Make separate grid DMZ for Class 3 VMs Comparable to “Guest networks” –Only outbound connectivity Detection of compromised guests –Extended security monitoring Packet inspection, netflows (SNORT, nfsen) Honeypots, etc. Simple policy: one warning, you’re out. Needs approval (network policy) from OST (Operations Steering Team) BIG Grid - Virtualization working group 17

TECHNOLOGY STUDY BIG Grid - Virtualization working group 18

Resource management Site Managing VMs Torque/PBS Box 2 “8 Virtual WNs” Box 3 “8 Class 2/3 VMs” OpenNebula Job queue VM queue Box 1 “Normal WN” Haizea BIG Grid - Virtualization working group19

Class 2/3 upload solution iSCSI/LVM Distributing VM images Box 3 “8 Class 2/3 VMs” Box 1 “Normal WN” Box 2 “8 Virtual WNs” Repository (SAN) Image BIG Grid - Virtualization working group20

Cached copy-on-write BIG Grid - Virtualization working group21 Box 1 Repository Cache Image COW VM Box 2 Cache Image COW VM Image

Interfacing VMs with ‘the grid’ Resource management Torque/PBSOpenNebula Class 2/3 upload solution Repository (SAN) Image Class 2 Class 3 discussion Grid middleware globus-job-run globus-gatekeeper globus-job-manager contact-string jm-pbs-long jm-opennebula qsub / opennebula Nimbus/OCCI BIG Grid - Virtualization working group22

VM contact-string User management mapping –Mapping to OpenNebula users Authentication / Authorization –Access to different VM images Grid middleware components involved: –Cream-CE, BLAHp, glexec –Execution Environment Service –Authorization Service Design BIG Grid - Virtualization working group 23 Coffee table discussion Parameter passing issue

Monitoring/Performance testing BIG Grid - Virtualization working group 24

Performance Small cluster –4 dual CPU quad core machines –Image server with 2 TB storage Integration with experimental testbed –Existing Cream-CE / Torque Testing –Network I/O, is NAT feasible? –File I/O, what is the COW overhead? –Realistic jobs BIG Grid - Virtualization working group 25

Other challenges Accounting, scheduling based on Fair Share Scalability! Rapidly changing landscape –New projects every week –New versions every month So many alternatives –VMWare, SGE, Eucalyptus, Enomaly –iSCSI, NFS, GFS, Hadoop –Monitoring and security tools BIG Grid - Virtualization working group 26

Conclusions Maintainability: no home grown scripting –Each solution should be part of a product –Validation procedure with each upgrade Deployment –Gradually move VM functionality in production 1.Introduce VM worker nodes 2.Virtual machine endpoint in grid middleware 3.Test with a few specific Class 2/3 VMs 4.Scaling and performance tuning BIG Grid - Virtualization working group 27