LCG/EGEE Incident Response Planning Ian Neilson Grid Deployment Group CERN EGEE3 Athens 21 April 2005 - 1
Incident Response Overview The OSG Incident Handling and Response Guide What it mandates What it recommends How to map to LCG/EGEE? Response Planning Classification based Use-cases Role of OSCT, ROCS etc. Grid Incident definition: “..event that poses a .. threat [to] the integrity of services, resources, infrastructure, or identities.” Note: We do have existing but old agreement: https://edms.cern.ch/document/428035 EGEE3 Athens 21 April 2005 - 2
Incident Response The OSG Incident Handling and Response Guide What it mandates (MUST do’s) REPORT RESPOND PROTECT information gathered ANALYSE What it recommends (SHOULD do’s) Provide monitored contact mailing lists at sites Public Disclosure (summary) through site Public Relations Use signed mails How to map to LCG/EGEE? EGEE3 Athens 21 April 2005 - 3
Incident Response Reporting (MUST) Provide contact information Individual contacts Monitored list (optional but HIGHLY desirable) Management through GOCDB (?soon) Report to LOCAL site security = sites should have local plan Does not replace or interfere with local plans Report to INCIDENT-REPORT-L@project.org Initial incident notification only, no chat Closed list Filtered abuse@.. & security@.. Currently we use project-lcg-security-csirts@cern.ch -egee- alias Open list hence no moderated lists EGEE3 Athens 21 April 2005 - 4
Incident Response Responding (MUST) Classification Containment More later… Containment Assumes local containment process in place Attacks through the grid Default action to block grid access initially Authorization control MUST be provided for services Attacks on the grid Little/no possible central control Notify the attacking site (NREN CSIRTS) Coordination of blocking, restoration of service Notification INCIDENT-DISCUSS-L@project.org User, VO if identity compromise Management Post-Incident Analysis EGEE3 Athens 21 April 2005 - 5
Incident Response Response Planning Objectives Provide a framework to use when something happens But must be usable flexibly Can be tested Classification Based ‘Use Cases’ LOW e.g. Local single non-privileged identity compromised, local denial of service. MEDIUM e.g. Local privileged identity compromised, attack on grid service not threatening grid stability. HIGH e.g. Exploitation of trust fabric, attack leading to grid instability or denial of service against all service replicas. EGEE3 Athens 21 April 2005 - 6
Incident Response Incident Class: LOW Case: Some local unattributed activity from a single user Reported by user to GGUS based on accounting Actors: GGUS, User, Site Response: Report received by SSO from GGUS Investigation (Site <-> User) Analysis: Single identity compromised Route to compromise may change response Notification by SSO: User, VO(suspend), REPORT-L, GGUS Issues: How does SSO contact user, VO? 000’s of distributed users -> notification overload …. EGEE3 Athens 21 April 2005 - 7
Incident Response Incident Class: Medium Case: Compute Element relaying SPAM Reported by external corporate security team Actors: Site, Ex-CSIRT Response: Report received by SSO from Ex-CSIRT Prelim. Investigation (SSO) Close service, remove from network Notification: REPORT-L, NREN CSIRT Analysis: CE rooted Notification: DISCUSS-L Issues Service-based threat analysis would be useful EGEE3 Athens 21 April 2005 - 8
Incident Response Incident Class: HIGH Case: Poisoning of information system Reported simultaneously from multiple locations, widespread job misdirection, failure, DOS Actors: Multiple Site, Experts, Developers/Deployment + Users, VOs, Management Response Reports might start as unassociated REPORT-L posts Team leader created, creates team – Notify: REPORT-L Prelim Analysis: logs, network traffic Mitigation: network blocking, rebuild/lock IS – Notify: DISCUSS-L Analysis: Broken protocol Patch Created, Packaged and Deployed – Notify: Site admin Notification: DISCUSS-L Issues How to bootstrap team and its leader – ROC/OSCT role Team communications – Incident tracking tools Team communications to developers Deployment scheduling – how long EGEE3 Athens 21 April 2005 - 9
Incident Response Incident Tracking Repository? Would be useful for Communications Analysis Reporting Could be difficult because of Which groups have access Becomes a target itself May duplicate logging of local systems How would it interact with local systems Current candidates LCG Savannah – Use for Service Challenge GGUS – Certificate Support but need development …? EGEE3 Athens 21 April 2005 - 10
Incident Response Summary OSG Document now needs to be implemented Need to put this into LCG/EGEE context Basic IR planning and model plans are necessary Use-case and role-play useful tools for creating these Should be tested in future security exercises OSCT is the group to organise this Note: Andrew Cormack’s document comparing grid incident response to ‘traditional’ CSIRT activities attached to the agenda. EGEE3 Athens 21 April 2005 - 11