Presentation is loading. Please wait.

Presentation is loading. Please wait.

GGF12 – 20 Sept 2004 - 1 LCG Incident Response Ian Neilson LCG Security Officer Grid Deployment Group CERN.

Similar presentations


Presentation on theme: "GGF12 – 20 Sept 2004 - 1 LCG Incident Response Ian Neilson LCG Security Officer Grid Deployment Group CERN."— Presentation transcript:

1 GGF12 – 20 Sept 2004 - 1 LCG Incident Response Ian Neilson LCG Security Officer Grid Deployment Group CERN

2 GGF12 – 20 Sept 2004 - 2 Background LCG – Large Hadron Collider (LHC) Computing Grid Computing environment for the 4 LHC experiments ALICE, ATLAS, CMS, LHCb LHC operation in 2007 Required 12-14 PetaBytes/year, equivalent 70,000 PCs compute * LCG1/2003 * LCG2/2003-4 * EGEE 70+ sites in Europe, USA, Asia, S. America …… 7000+ CPUs 6000GB+ Storage Software certification, testing, deployment group Distributed GOCs UK http://goc.grid-support.ac.uk/gridsite/gocmain/monitoring/ Taiwan http://goc.grid.sinica.edu.tw/goc/ www.cern.ch/lcg

3 GGF12 – 20 Sept 2004 - 3 Grid monitoring

4 GGF12 – 20 Sept 2004 - 4 EGEE - Enabling Grids for E-science in Europe 12 federations with 70 partner institutions 2 year + 2 project Operate a service grid facility for e-science Initial built on LCG2 infrastructure Re-engineer a robust middleware layer glite Attract new users Research and Industry Broader focus than HEP: Biomedical, Earth Science …….. www.cern.ch/egee

5 GGF12 – 20 Sept 2004 - 5 Policy – the Joint Security Group Security & Availability Policy Usage Rules Certification Authorities Audit Requirements GOC Guides Incident Response User Registration Application Development & Network Admin Guide http://cern.ch/proj-lcg-security/documents.html

6 GGF12 – 20 Sept 2004 - 6 Incident Response Policy Agreement on Incident Response June 2003 for LCG1 What is an incident? Security investigation causing service interruption Suspected misuse of resources beyond site “Reasonable possibility” of stolen credentials Not to expire or be revoked within 3 days Classifications Identity theft Suspected / Probable / Confirmed Actions Misuse / Enforcement / Restoration / Escalation

7 GGF12 – 20 Sept 2004 - 7 Incident Response - Communications Site enrolment collects 2 entries per site Registration questionnaire Site Contacts mail list Closed list of named individuals email, telephone CSIRT list mail List-of-lists (Open) 1 entry per site Updated list circulated to contacts list as sites enrol Pointers to policy documents for responsibilities Channels Users - local site contacts (& GOC) Contacts - discussion and information exchange CSIRT - incident notification, update Roll-out- system administrators

8 GGF12 – 20 Sept 2004 - 8 Incident Response – management issues LCG “community” known at CERN, EGEE community is broader User enrolment is well controlled, site enrolment is not Incomplete questionnaires Personal instead of list List instead of personal Undeliverable addresses Delayed delivery Moderated delivery Enrolment information not circulated SPAM, SPAM, SPAM, SPAM Lists need active management! Can we “see” all the sites? CERN/GOC view VO “private” information systems

9 GGF12 – 20 Sept 2004 - 9 Incident response – operational issues Recognising and reporting  What is a local CSIRT? Scale of coverage 24x7 site/campus network operations team Department Security Officer LCG system administrator Who is a security contact? as above Intersection with local CSIRT procedures Local quarantine and analysis Keeping emergency channels clear Discussions, cross-postings

10 GGF12 – 20 Sept 2004 - 10 Incident response – near-term JSG, EGEE MWSG/JRA3, OSG, …… Site and VO registration policy and process Control gathering, distribution and management of data Sites need to understand requirements and responsibilities Coverage, access, audit Needs to be actively managed (? Self managed) Operational Security Co-ordination Team (OSCT) Ownership of security incidents From notification to resolution Liaise with national/institute CERTs Ownership of known problems Liaise with development & deployment groups Co-ordination of monitoring Post-mortem analysis Team of experts

11 GGF12 – 20 Sept 2004 - 11 Security Co-ordination How does OSCT map onto EGEE operations structures? Resource Centres (lots) Regional Operations Centres - ROC (~9) Core Infrastructure Centres - CIC (~5) Operations Management Centre - OMC (1) Co-ordination with Open Science Grid ……… Adopt same co-ordinating model

12 GGF12 – 20 Sept 2004 - 12 2004 Security Service Challenges Objectives Evaluate the effectiveness of current procedures by simulating a small and well defined set of security incidents. Use the experiences of a) in an iterative fashion (during the challenges) to update procedures. Formalise the understanding gained in a) & b) in updated incident response procedures. Provide feedback to middleware development and testing activities to inform the process of building security test components. Exercise response procedures in controlled manner Non-intrusive Compute resource usage trace to owner –Run a job to send an email Storage resource trace to owner –Run a job to store a file Disruptive Disrupt a service and map the effects on the service and grid

13 GGF12 – 20 Sept 2004 - 13 LCG/EGEE Incident Response Thank You Thank you to UK PPARC


Download ppt "GGF12 – 20 Sept 2004 - 1 LCG Incident Response Ian Neilson LCG Security Officer Grid Deployment Group CERN."

Similar presentations


Ads by Google