Building An Analytics-Enabled Security Operations Ctr (SOC)

Name: Building An Analytics-Enabled Security Operations Ctr (SOC)
Uploaded: 2017-07-07T14:47:57+00:00
Duration: PTM33S32
Channel: Kelsey Hey
Description: Building An Analytics-Enabled Security Operations Ctr (SOC)

Building An Analytics-Enabled Security Operations Ctr (SOC)
Mike Munn Splunk Engineering Manager

Who Can Benefit From This PPT?
Wants to Build a SOC Primary: Wants to Enhance Existing SOC Performs SOC-Like Functions Secondary: “Want to build a SOC” customers are primary audience and others are secondary. But even small orgs with no formal SOC plans can learn from this PPT. The material in this PPT is what our customers across many industries and sizes tend to do. It’s just a summary…precise SOC requirements will be different for each organization.

What is a Security Operations Center (SOC)?
Centralized location(s) where key IT systems of an organization are monitored, assessed and defended from cyber attacks. PRIMARY GOAL: Reduce risk via improved security SECONDARY GOALS: Compliance, anti-DDOS attack, fraud detection Without a SOC there often is siloed, incomplete visibility which leads to a weaker security posture. So by consolidating all the security experts and relevant data into a central location, threats can be spotted faster and efficiencies can be had.

Before Building SOC Need to Understand:
Significant upfront and ongoing investment of money and time Prerequisite is a certain security maturity level Structure will vary for each organization Important to prioritize and phase the build-out Executive-level and business unit support required To build a SOC you need basic security products/process in place and tuned (see SANS 20 for examples), as well as enough skilled people to run a SOC. If you do not have a basic level of maturity, you may need to address this first before building a SOC. Prioritization includes: data sources to onboard (onboard the most critical sources first), which threats to model out and look for, playbooks, people, staffing hours (start 8x5 and move to 24/7, etc)

Three Interrelated Components of a SOC
Process Technology People Any SOC is comprised of people, process, and technology. All 3 are critical to a successful SOC

Process

Threat Modeling & Playbooks
Intellectual property or customer data loss, compliance, etc. Prioritize based on impact What threats does the organization care about? 1 How it would access and exfiltrate confidential data 2 What would the threat look like? Requires machine data and external context Searches or visualizations that would detect it (correlated events, anomaly detection, deviations from a baseline, risk scoring) 3 How would we detect/block the threat? Severity, response process, roles and responsibilities, how to document, how to remediate, when to escalate or close, etc. 4 What is the playbook/process for each type of threat? This is step one of the SOC build out and prioritizes where to get started. 1. Could also include DDOS, protecting an asset or person, etc. Business people will help you decide this, perhaps based on overall $$ a specific threat could cost the organization. 2. The “indicators of compromise” 3. Includes: machine data to spot the threat (this drives which data sources to prioritize). Also searches needed to detect it (correlated events, anomaly detection, deviations) 4. This is all the detail on what to do when a specific alert is generated. Will vary based on the threat, but the playbook should have a lot of detail so when the alert pops up, everyone knows how to deal with it appropriately. Not shown here, but red team or simulation exercises are helpful to make sure processes work correctly. Red team exercises can also find unknown weaknesses that should be addressed in threat modeling.

Simplified SOC Tiers TIER 1 TIER 2 TIER 3+ Monitoring ALERTS FROM:
Security Intelligence Platform Help Desk Other IT Depts. TIER 1 Monitoring Opens tickets, closes false positives Basic investigation and mitigation TIER 2 Deep investigations/CSIRT Mitigation/recommends changes TIER 3+ Advanced investigations/CSIRT Prevention Threat hunting Forensics Counter-intelligence Malware reverser (MINIMIZE INCIDENTS REACHING THEM) This is a list of the basic process/incident flow in a SOC. Incidents come in at top left. They then are processed by the different Tiers personnel in the SOC. Typically tier-1 analysts are the least skilled analysts. They try to quickly dismiss false positives and for real incidents open a ticket and attempt to remediate the incident. If they cannot remediate it or do not fully understand the threat, they can escalate it to the more skilled tier-2 analysts. These tier-2 analysts often use more advanced tools, such as packet capture tools, to research an incident. Tier 2 tries to investigate/remediate all incidents but if they cannot, they may escalate the incident to the most advanced analysts, the tier 3 analysts. Since Tier 3 analysts are the most skilled and expensive, it is key to limit incidents reaching them to the very “difficult” or critical ones. Notice the responsibilities of the tiers on the right. We will come back to this later and how the proper technology can help with most of these use cases. Tier 2/3 can relay feedback into the rest of the org to improve security Tier 3 may be part of the incident review process, but in some orgs it is not – it is a separate team within the SOC. Also sometimes CSIRT (Computer Security Incident Response Team) is within the SOC as the tier2/2 levels, but sometimes it outside of the SOC and distributed across the organization

One vs. Multiple Locations
One Location Multiple Locations Morning Morning Afternoon Midnight Midnight Afternoon West Coast East Coast APAC Most do one location. One Location – Better communication easier continuity and management. More expensive as differential for the late hours will have to be paid to employees. Multiple location – harder to work on same issues including language issues, but cheaper as no need for differential pay

Shift Rotations – One Location
Seattle SHIFT 1 SHIFT 2 SHIFT 3 7AM — 5PM 3PM — 1AM 11PM — 9AM TIER 1 TIER 1 TIER 1 TIER 2 TIER 2 TIER 3

Shift Rotations – Multiple Locations
Seattle New York Hong Kong SHIFT 1 SHIFT 2 SHIFT 3 9AM — 5PM 9AM — 5PM 9AM — 5PM TIER 1 TIER 1 TIER 1 TIER 2 TIER 2 TIER 2 TIER 3

Operational Continuity
Shift Overlaps Shift Handover Procedures Shift Reports Overlap is key so knowledge is transferred over smoothly and the outgoing shift can bring the incoming shift up to speed. Handover is key – everyone gets into a room and shares what is going on. Agree/disagree on next steps. Shift report is paperwork is a collection of many attack reports. Lists: case worked with comments, ongoing attacks and where they stood

Other Process Items Involve Outside Groups to Assist
Business people, IT teams, SMEs Threat modeling, investigations, remediation Incorporate Learnings Into the SOC and Organization Adjust correlation rules or IT configurations, user education, change business processes Automate Processes Security intelligence platform custom UIs to accelerate investigations and alerting, ticketing system Have a process for involving business people, other IT and security teams (incl red teams) , and SMEs outside the SOC to help with threat modeling, incident investigations, and remediation. It is key to have the business people involved in telling you what the mission critical apps/data is so you can then protect it. Also, you perhaps can even share machine data or UI access with these other IT teams to help them with their jobs, increase uptime, and to improve collaboration Have a process so learnings are incorporated back into the SOC, IT security, and the organization Adjust correlation rules in the securrity intelligence platform, change product settings and configurations, recommend user education, fix unsafe business processes, etc Automate processes where possible: Use security intelligence platform to prioritize alerts, and give incident investigators interfaces to accelerate reviews. An example could be SOC analyst can type in an IP or user name in a form box on the UI and then get back a lot of relevant info that reflect the playbook. Or a right-click workflow action to grab a PCAP file. Ticketing systems for workflow and incident management

Demonstrate SOC Value Anecdotes of threats defeated Metrics on events/tickets, resolution time Regular communication to execs and rest of org Show reduced business risk via KPIs SOCs require a significant ongoing investment so it is key to show the value of the SOC to keep the resources coming Ongoing metrics to show the value of the SOC could include: Total events, total cases opened and closed, total threats remediated, average time to escalate, average time to remediate number of recommendations the SOC has made to the rest of the organization to reduce risk Show how the SOC has met the original goal of reducing business risk Periodic communication to key stakeholders and others groups to promote the value of the SOC Have meaningful anecdotes and high-level metrics ready to show value to executives

People

Types of People Multiple roles with different background, skills, pay levels, personalities SOC Director SOC Manager SOC Architect Tier 1 Analyst Tier 2 Analyst Tier 3 Analyst Forensics Specialist Malware Engineer Counter-Intel On-the-job training and mentoring, and external training & certifications Need motivation via promotion path and challenging work Operating hours and SOC scope play key role in driving headcount Need to staff multiple roles. Different background, skills, pay levels, personalities for each role: SOC architect, SOC manager, tier 1 analyst, tier 2 analyst, tier 3 analyst, malware engineer, forensics specialist, counter- intelligence specialist, content developer, etc For tier 2/3 it is helpful to have staff who know the environment well and what “abnormal” looks like. Also staff who are willing to leverage stats to find threats. Provide a promotion path so personnel can move up the tiers. Staffing model drives headcount Some 3rd-party sources indicate a minimum of 7 people are needed for 24x7 monitoring. Others indicate 10 people for 24x7. Another source says for 8x5 at least 2 people are needed. Then again, at large SOCs (for example at a major defense contractor) there can be 50+ people in the SOC and also more than 3 tiers.

Different Skillsets Needed
Role/Title Desired Skills Tier 1 Analyst Few years in security, basic knowledge of systems and networking Tier 2 Analyst Former Tier 1 experience, deeper knowledge of security tools, strong networking / system / application experience, packet analysis, incident response tools Tier 3 Analyst All the above + can adjust the security intelligence platform, knows reverse engineering/threat intelligence/forensics SOC Director Hiring and staffing, interfacing with execs to show value and get resources, establishing metrics and KPIs SOC Architect Experience designing large scale security operations, security tools and processes

Technology

Need Security Intelligence Platform (SIEM + more!)
Monitoring, Correlations, Alerts Ad Hoc Search & Investigate Custom Dashboards And Reports Analytics And Visualization Developer Platform Meets Key Needs of SOC Personnel Industrial Control Authentication Data Loss Prevention Web Real-time Machine Data Vulnerability Scans Firewall DHCP/ DNS Security Intelligence Platform Mobile Intrusion Detection Threat Feeds Asset Info Employee Data Stores Applications External Lookups / Enrichment Servers Custom Apps Anti-Malware Network Flows Storage Badges Cloud Apps Need a Security Intelligence platform which is a SIEM plus more. We will come back to that later. In summary this platform can automatically sift through hundreds or thousands of daily security-related events to alert on and assign severity levels to only the handful of incidents that really matter. For these incidents, the platform then enables SOC analysts to quickly research and remediate incidents. This platform can ingest any type of machine data, from any source in real time. These are listed here on the left and are flowing into the platform for indexing. The platform should also be able to leverage lookups and external data to enrich existing data. This is showed on the bottom and includes employee information from AD, asset information from a CMDB, blacklists of bad external IPs from 3rd-party threat intelligence feeds, application lookups, and more. Correlation searches can include this external content. So for example the platform can alert you if a low-level employee accesses a file share with critical data, but not if the file share has harmless data. Or the platform can alert you if a user name is used specifically for an employee who no longer works for your organization. These are especially high-risk events. A SOC can then perform the use cases on the top right on the data. These use cases cover all the personnel tiers in the SOC so they can all leverage the platform. They can search through the data, monitor the data and be alerted in real-time if search parameters are met. This includes cross-data source correlation rules which help find the proverbial needle in the haystack so the SOC only needs to focus on the tiny number of priority incidents that matter hidden among a sea of events. The raw data can be aggregated in seconds for custom reports and dashboards. Also the platform should be one that developers can build on. It uses a well documented Rest API and several SDKs so developers and external applications can directly access and act on the data within it.

Enables Many Security Use Cases
Incident Investigations & Forensics Security & Compliance Reporting Real-time Monitoring of Known Threats detecting Unknown Threats Fraud Detection Insider Threat Security Intelligence Platform The security intelligence platform enables all these use cases. Put in the data once then do all of this. In theory it could also extend to non-security use cases for an even stronger ROI.

Flexibility & Performance to Meet SOC Needs
SIEM Security Intelligence Platform Data Sources to Index Limited Any technology, device Add Intelligence & Context Difficult Easy Speed & Scalability Slow and limited scale Fast and horizontal scale Search, Reporting, Analytics Difficult and rigid Easy and flexible Anomaly/Outlier Detection and Risk Scoring Flexible Open Platform Closed Open with API and SDKs This slide has come from many customers that have used and evaluated multiple SIEM technologies. Traditional SIEMs have limitations because: Only selected data sources can be brought into the system – inflexible. Challenge to support diverse environment, esp if there are custom devices, applications, environments Slow query and reporting, Slow response from reports coming back. Security intelligence platform scalability refers to a flat file data store (not a structured database), distributed search, and installation on commodity hardware. Also the ability to scale out horizontally to handle the largest and most demanding global SOC needs, with the ability to index over 100 TB a day Forced to build custom reporting suite outside of the actual SIEM - out of box functionality looks good, but limited flexibility. Caution, companies that don’t need or want customization will see this as a strength and not a weakness Traditional SIEMs have limited ability to so anomaly detection and risk scoring so it is more difficult to find the advanced threats that evade detection from traditional security products b/c they are not signature based. For these, anomaly detection is helpful to uncover them and their atypical patterns. SIEMS often are closed platforms with no APIs/SDKs, rigid UIs and configuration settings, and difficulty integrating them with other apps in the SOC or IT environment. A security intelligence platform is the opposite with APIs/SDKs, underlying configurations that are all exposed and adjustable, and a flexible UI in XML that can be customized. SOC teams have the full ability to customize the platform to meet their needs and integrate into anything else in the SOC.

Connect the “Data-Dots” to See the Whole Story
Delivery, Exploit Installation Gain Trusted Access Exfiltration Data Gathering Upgrade (Escalate) Lateral Movement Persist, Repeat Threat Pattern Persist, Repeat Threat Intelligence Attacker, know C2 sites, infected sites, IOC, attack/campaign intent and attribution External threat intel Internal threat intel Indicators of compromise Network Activity/Security Where they went to, who talked to whom, attack transmitted, abnormal traffic, malware download Malware sandbox Web proxy NetFlow Firewall IDS / IPS Vulnerability scanner Endpoint Activity/Security What process is running (malicious, abnormal, etc.) Process owner, registry mods, attack/malware artifacts, patching level, attack susceptibility DHCP DNS Patch mgmt Endpoint (AV/IPS/FW) ETDR OS logs Authorization – User/Roles Access level, privileged users, likelihood of infection, where they might be in kill chain Active Directory LDAP CMDB Operating System Database VPN, AAA, SSO Threats follow the steps at the top right -to-to enter an org and exfiltrate data. To spot this you need to connect the dots as they move through this process. To do this you need data from the 4 data source categories on the far left. Examples are to the right. Note – “malware sandbox” includes FireEye and Palo Alto Network’s Wildfire technology which detonates and web-based payloads and attachments and links in a virtual sandbox to see what they do & if they are malicious. Sometimes this category is also called “payload analysis” or “advanced malware detection”. ETDR is Endpoint Threat Detection and Response, an emerging category of next-gen endpoint technology. Cyvera (now part of Palo Alto Networks), Carbon Black (part of Bit9), RSA ECAT, Bromium, and Mandiant MIR fall into this category. Tell this slide perhaps as a “story” where you start with an alert at top (threat intel) and then pivot and use the other data sources to complete the investigation. See the appendix slide with a sample story.

Other SOC Technologies
Advanced Incident Response Tools Ticketing/Case Management System Packet Capture Disk Forensics Reverse Malware Tools Other specialized tools are needed in a SOC. Other advanced tools for complex incident investigations. A ticketing system to hand off incidents among the SOC tiers.

Splunk Enterprise A Security Intelligence Platform

Splunk Gives Path to SOC Maturity
Real-Time Risk Insight Proactive Security Situational Awareness Proactive Monitoring and Alerting Search and Investigate Technology that enhances all your SOC personnel and processes Reactive

Splunk Can Complement an Existing SIEM
Scenario 1 Scenario 2 Scenario 3 INTEGRATION None Splunk feeds SIEM SIEM feeds Splunk LOGGING & SIEM SIEM INVESTIGATIONS / FORENSICS CORRELATIONS / ALERTING / REPORTING COMPLIANCE NOTES May have different data sources going to Splunk vs SIEM Splunk typically sends just subset of its raw data to SIEM Initially, SIEM connectors are on too many hosts to be replaced In scenario 1 the products are completely standalone. The SIEM alerts and the SOC analysts then walk over to Splunk for the deep investigation. In Scenario 2 it is Splunk feeding the SIEM. Usually the SOC analysts are comfortable with the UI and reports of the existing SIEM so want it in place for correlations/alerting/reporting. Splunk still used for deep investigations. In scenario 3 the existing SIEM feeds Splunk but all SOC use cases are done in Splunk. The existing SIEM is only in place because SIEM connectors to bring in data are on hundreds or thousands of hosts already so removing/replacing them is difficult. Usually with time the organization will start sending data from the sources directly to Splunk, often with the universal forwarder, and eventually the traditional SIEM is retired.

Splunk App for Enterprise Security Pre-built searches, alerts, reports, dashboards, workflow
Dashboards and Reports Incident Investigations & Management Over 45 pre-built searches 37 predefined dashboards 160 reports Supporting common security metrics Statistical Outliers Asset and Identity Aware 27

Key Takeaways SOC requires investment in people, process and technology Splunk Enterprise is a security intelligence platform that can power your SOC Splunk software makes your SOC personnel and processes more efficient

Next Steps Splunk Security Advisory Services
Help assess, build, implement, optimize a SOC Includes people, process, and technology Can include how to use Splunk within the SOC Evaluate Splunk Enterprise and the Splunk App for Enterprise Security

Thank You!

Appendix

Ticketing Best Practices
Plan Your Queues Think of Automating Escalations Attack/Incident Reports Are Your Receipt Have in place strong ticketing/case management system. Think about queues and interaction with groups outside the SOC. If you need to hand a task to a different group keep in mind you may need to open a ticket on their system as well. Also determine how to receive tickets and when to open a ticket Automating escalations is way in the security intelligence platform to automatically grab relevant data for the ticket Attack/Incident Reports is the ticket with all the detail

MSSP Model PROS CONS Around the Clock Lacks Agility
Higher Visibility of the Threat Landscape Actionable Alerting Dedicated Specialties Does not know your infrastructure

Whiteboard: Splunk SOC/ES Architecture
Points: Build from previous architecture Layer in ES components Cover ES Search Head – Function – Sizing Cover TAs – Function – Benefits Offload Search load to Splunk Search Heads Auto load-balanced forwarding to Splunk Indexers Send data from thousands of servers using any combination of Splunk forwarders

Merge the Entity And Adversary Models
Controls SSCM Chef Audit Tripwire AD Monitor Graphing Intel Exposure Nmap Nessus High Tripwire Chef AD Medium Scans Intel Low Nessus Graphing High Tripwire Proxy Medium DNS Red Team Low IDS/IPS Outbound Recon Nmap OSINT Delivery Proxy Exploitation Tripwire IDS/IPS C2 DNS Outbound Mon Intent Red Team

Example: Connecting the “data-dots”
Delivery, Exploit Installation Gain Trusted Access Exfiltration Data Gathering Upgrade (Escalate) Lateral movement Threat Intelligence Auth - User Roles Host Activity/Security Network Activity/Security Blacklisted IP Blacklisted IP Malware download Continued sessions during abnormal hours, periodicity, patterns, etc. Malware and endpoint execution data Sessions across different access points (web, remote control, tunneled) Program installation Machine data Traffic data Abnormal behavior High confidence event Med confidence event Low confidence event User on machine, link to program and process Malware install An example of an advanced threat. You need data from the 4 data source categories on the far left in order to connect the dots to see the full activity of the threat

Sample Job Description – Tier 2/3/CSIRT
An example of an advanced threat. You need data from the 4 data source categories on the far left in order to connect the dots to see the full activity of the threat

Sample Job Description – Tier 1 SOC
An example of an advanced threat. You need data from the 4 data source categories on the far left in order to connect the dots to see the full activity of the threat

Building An Analytics-Enabled Security Operations Ctr (SOC)

Similar presentations

Presentation on theme: "Building An Analytics-Enabled Security Operations Ctr (SOC)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Building An Analytics-Enabled Security Operations Ctr (SOC)

Similar presentations

Presentation on theme: "Building An Analytics-Enabled Security Operations Ctr (SOC)"— Presentation transcript:

Similar presentations

About project

Feedback