Presentation on theme: "Building An Analytics-Enabled Security Operations Ctr (SOC)"— Presentation transcript:
1 Building An Analytics-Enabled Security Operations Ctr (SOC) Mike MunnSplunk Engineering Manager
2 Who Can Benefit From This PPT? Wants toBuild a SOCPrimary:Wants to EnhanceExisting SOCPerforms SOC-LikeFunctionsSecondary:“Want to build a SOC” customers are primary audience and others are secondary. But even small orgs with no formal SOC plans can learn from this PPT.The material in this PPT is what our customers across many industries and sizes tend to do. It’s just a summary…precise SOC requirements will be different for each organization.
3 What is a Security Operations Center (SOC)? Centralized location(s) where key IT systems of an organization are monitored, assessed and defended from cyber attacks.PRIMARY GOAL: Reduce risk via improved securitySECONDARY GOALS: Compliance, anti-DDOS attack, fraud detectionWithout a SOC there often is siloed, incomplete visibility which leads to a weaker security posture. So by consolidating all the security experts and relevant data into a central location, threats can be spotted faster and efficiencies can be had.
4 Before Building SOC Need to Understand: Significant upfront and ongoing investment of money and timePrerequisite is a certain security maturity levelStructure will vary for each organizationImportant to prioritize and phase the build-outExecutive-level and business unit support requiredTo build a SOC you need basic security products/process in place and tuned (see SANS 20 for examples), as well as enough skilled people to run a SOC. If you do not have a basic level of maturity, you may need to address this first before building a SOC.Prioritization includes: data sources to onboard (onboard the most critical sources first), which threats to model out and look for, playbooks, people, staffing hours (start 8x5 and move to 24/7, etc)
5 Three Interrelated Components of a SOC ProcessTechnologyPeopleAny SOC is comprised of people, process, and technology. All 3 are critical to a successful SOC
7 Threat Modeling & Playbooks Intellectual property or customer data loss, compliance, etc.Prioritize based on impactWhat threats does the organization care about?1How it would access and exfiltrate confidential data2What would the threat look like?Requires machine data and external contextSearches or visualizations that would detect it (correlated events, anomaly detection, deviations from a baseline, risk scoring)3How would we detect/block the threat?Severity, response process, roles and responsibilities, how to document, how to remediate, when to escalate or close, etc.4What is the playbook/process for each type of threat?This is step one of the SOC build out and prioritizes where to get started.1. Could also include DDOS, protecting an asset or person, etc. Business people will help you decide this, perhaps based on overall $$ a specific threat could cost the organization.2. The “indicators of compromise”3. Includes: machine data to spot the threat (this drives which data sources to prioritize). Also searches needed to detect it (correlated events, anomaly detection, deviations)4. This is all the detail on what to do when a specific alert is generated. Will vary based on the threat, but the playbook should have a lot of detail so when the alert pops up, everyone knows how to deal with it appropriately.Not shown here, but red team or simulation exercises are helpful to make sure processes work correctly. Red team exercises can also find unknown weaknesses that should be addressed in threat modeling.
8 Simplified SOC Tiers TIER 1 TIER 2 TIER 3+ Monitoring ALERTS FROM: Security Intelligence PlatformHelp DeskOther IT Depts.TIER 1MonitoringOpens tickets, closes false positivesBasic investigation and mitigationTIER 2Deep investigations/CSIRTMitigation/recommends changesTIER 3+Advanced investigations/CSIRTPreventionThreat huntingForensicsCounter-intelligenceMalware reverser(MINIMIZE INCIDENTS REACHING THEM)This is a list of the basic process/incident flow in a SOC. Incidents come in at top left. They then are processed by the different Tiers personnel in the SOC. Typically tier-1 analysts are the least skilled analysts. They try to quickly dismiss false positives and for real incidents open a ticket and attempt to remediate the incident. If they cannot remediate it or do not fully understand the threat, they can escalate it to the more skilled tier-2 analysts. These tier-2 analysts often use more advanced tools, such as packet capture tools, to research an incident. Tier 2 tries to investigate/remediate all incidents but if they cannot, they may escalate the incident to the most advanced analysts, the tier 3 analysts. Since Tier 3 analysts are the most skilled and expensive, it is key to limit incidents reaching them to the very “difficult” or critical ones.Notice the responsibilities of the tiers on the right. We will come back to this later and how the proper technology can help with most of these use cases.Tier 2/3 can relay feedback into the rest of the org to improve securityTier 3 may be part of the incident review process, but in some orgs it is not – it is a separate team within the SOC.Also sometimes CSIRT (Computer Security Incident Response Team) is within the SOC as the tier2/2 levels, but sometimes it outside of the SOC and distributed across the organization
9 One vs. Multiple Locations One LocationMultiple LocationsMorningMorningAfternoonMidnightMidnightAfternoonWest CoastEast CoastAPACMost do one location.One Location – Better communication easier continuity and management. More expensive as differential for the late hours will have to be paid to employees. Multiple location – harder to work on same issues including language issues, but cheaper as no need for differential pay
12 Operational Continuity Shift OverlapsShift Handover ProceduresShift ReportsOverlap is key so knowledge is transferred over smoothly and the outgoing shift can bring the incoming shift up to speed.Handover is key – everyone gets into a room and shares what is going on. Agree/disagree on next steps.Shift report is paperwork is a collection of many attack reports. Lists: case worked with comments, ongoing attacks and where they stood
13 Other Process Items Involve Outside Groups to Assist Business people, IT teams, SMEsThreat modeling, investigations, remediationIncorporate Learnings Into the SOC and OrganizationAdjust correlation rules or IT configurations, user education, change business processesAutomate ProcessesSecurity intelligence platform custom UIs to accelerate investigations and alerting, ticketing systemHave a process for involving business people, other IT and security teams (incl red teams) , and SMEs outside the SOC to help with threat modeling, incident investigations, and remediation. It is key to have the business people involved in telling you what the mission critical apps/data is so you can then protect it.Also, you perhaps can even share machine data or UI access with these other IT teams to help them with their jobs, increase uptime, and to improve collaborationHave a process so learnings are incorporated back into the SOC, IT security, and the organizationAdjust correlation rules in the securrity intelligence platform, change product settings and configurations, recommend user education, fix unsafe business processes, etcAutomate processes where possible:Use security intelligence platform to prioritize alerts, and give incident investigators interfaces to accelerate reviews. An example could be SOC analyst can type in an IP or user name in a form box on the UI and then get back a lot of relevant info that reflect the playbook. Or a right-click workflow action to grab a PCAP file.Ticketing systems for workflow and incident management
14 Demonstrate SOC ValueAnecdotes of threats defeatedMetrics on events/tickets, resolution timeRegular communication to execs and rest of orgShow reduced business risk via KPIsSOCs require a significant ongoing investment so it is key to show the value of the SOC to keep the resources comingOngoing metrics to show the value of the SOC could include:Total events, total cases opened and closed, total threats remediated, average time to escalate, average time to remediatenumber of recommendations the SOC has made to the rest of the organization to reduce riskShow how the SOC has met the original goal of reducing business riskPeriodic communication to key stakeholders and others groups to promote the value of the SOCHave meaningful anecdotes and high-level metrics ready to show value to executives
16 Types of PeopleMultiple roles with different background, skills, pay levels, personalitiesSOC DirectorSOC ManagerSOC ArchitectTier 1 AnalystTier 2 AnalystTier 3 AnalystForensics SpecialistMalware EngineerCounter-IntelOn-the-job training and mentoring, and external training & certificationsNeed motivation via promotion path and challenging workOperating hours and SOC scope play key role in driving headcountNeed to staff multiple roles. Different background, skills, pay levels, personalities for each role: SOC architect, SOC manager, tier 1 analyst, tier 2 analyst, tier 3 analyst, malware engineer, forensics specialist, counter- intelligence specialist, content developer, etcFor tier 2/3 it is helpful to have staff who know the environment well and what “abnormal” looks like. Also staff who are willing to leverage stats to find threats.Provide a promotion path so personnel can move up the tiers.Staffing model drives headcountSome 3rd-party sources indicate a minimum of 7 people are needed for 24x7 monitoring. Others indicate 10 people for 24x7. Another source says for 8x5 at least 2 people are needed. Then again, at large SOCs (for example at a major defense contractor) there can be 50+ people in the SOC and also more than 3 tiers.
17 Different Skillsets Needed Role/TitleDesired SkillsTier 1 AnalystFew years in security, basic knowledge of systems and networkingTier 2 AnalystFormer Tier 1 experience, deeper knowledge of security tools, strong networking / system / application experience, packet analysis, incident response toolsTier 3 AnalystAll the above + can adjust the security intelligence platform, knows reverse engineering/threat intelligence/forensicsSOC DirectorHiring and staffing, interfacing with execs to show value and get resources, establishing metrics and KPIsSOC ArchitectExperience designing large scale security operations, security tools and processes
19 Need Security Intelligence Platform (SIEM + more!) Monitoring, Correlations, AlertsAd Hoc Search & InvestigateCustom Dashboards And ReportsAnalytics And VisualizationDeveloper PlatformMeets Key Needs of SOC PersonnelIndustrial ControlAuthenticationData Loss PreventionWebReal-time Machine DataVulnerabilityScansFirewallDHCP/ DNSSecurity Intelligence PlatformMobileIntrusionDetectionThreatFeedsAssetInfoEmployeeDataStoresApplicationsExternal Lookups / EnrichmentServersCustom AppsAnti-MalwareNetworkFlowsStorageBadgesCloudAppsNeed a Security Intelligence platform which is a SIEM plus more. We will come back to that later. In summary this platform can automatically sift through hundreds or thousands of daily security-related events to alert on and assign severity levels to only the handful of incidents that really matter. For these incidents, the platform then enables SOC analysts to quickly research and remediate incidents.This platform can ingest any type of machine data, from any source in real time. These are listed here on the left and are flowing into the platform for indexing. The platform should also be able to leverage lookups and external data to enrich existing data. This is showed on the bottom and includes employee information from AD, asset information from a CMDB, blacklists of bad external IPs from 3rd-party threat intelligence feeds, application lookups, and more. Correlation searches can include this external content. So for example the platform can alert you if a low-level employee accesses a file share with critical data, but not if the file share has harmless data. Or the platform can alert you if a user name is used specifically for an employee who no longer works for your organization. These are especially high-risk events.A SOC can then perform the use cases on the top right on the data. These use cases cover all the personnel tiers in the SOC so they can all leverage the platform. They can search through the data, monitor the data and be alerted in real-time if search parameters are met. This includes cross-data source correlation rules which help find the proverbial needle in the haystack so the SOC only needs to focus on the tiny number of priority incidents that matter hidden among a sea of events. The raw data can be aggregated in seconds for custom reports and dashboards. Also the platform should be one that developers can build on. It uses a well documented Rest API and several SDKs so developers and external applications can directly access and act on the data within it.
20 Enables Many Security Use Cases Incident Investigations & ForensicsSecurity & Compliance ReportingReal-time Monitoring of Known Threatsdetecting Unknown ThreatsFraudDetectionInsider ThreatSecurity Intelligence PlatformThe security intelligence platform enables all these use cases. Put in the data once then do all of this. In theory it could also extend to non-security use cases for an even stronger ROI.
21 Flexibility & Performance to Meet SOC Needs SIEMSecurity Intelligence PlatformData Sources to IndexLimitedAny technology, deviceAdd Intelligence & ContextDifficultEasySpeed & ScalabilitySlow and limited scaleFast and horizontal scaleSearch, Reporting, AnalyticsDifficult and rigidEasy and flexibleAnomaly/Outlier Detection and Risk ScoringFlexibleOpen PlatformClosedOpen with API and SDKsThis slide has come from many customers that have used and evaluated multiple SIEM technologies. Traditional SIEMs have limitations because:Only selected data sources can be brought into the system – inflexible. Challenge to support diverse environment, esp if there are custom devices, applications, environmentsSlow query and reporting, Slow response from reports coming back. Security intelligence platform scalability refers to a flat file data store (not a structured database), distributed search, and installation on commodity hardware. Also the ability to scale out horizontally to handle the largest and most demanding global SOC needs, with the ability to index over 100 TB a dayForced to build custom reporting suite outside of the actual SIEM - out of box functionality looks good, but limited flexibility. Caution, companies that don’t need or want customization will see this as a strength and not a weaknessTraditional SIEMs have limited ability to so anomaly detection and risk scoring so it is more difficult to find the advanced threats that evade detection from traditional security products b/c they are not signature based. For these, anomaly detection is helpful to uncover them and their atypical patterns.SIEMS often are closed platforms with no APIs/SDKs, rigid UIs and configuration settings, and difficulty integrating them with other apps in the SOC or IT environment. A security intelligence platform is the opposite with APIs/SDKs, underlying configurations that are all exposed and adjustable, and a flexible UI in XML that can be customized. SOC teams have the full ability to customize the platform to meet their needs and integrate into anything else in the SOC.
22 Connect the “Data-Dots” to See the Whole Story Delivery, Exploit InstallationGain Trusted AccessExfiltrationData GatheringUpgrade (Escalate) Lateral MovementPersist, RepeatThreat PatternPersist, RepeatThreat IntelligenceAttacker, know C2 sites, infected sites, IOC, attack/campaign intent and attributionExternal threat intelInternal threat intelIndicators of compromiseNetwork Activity/SecurityWhere they went to, who talked to whom, attack transmitted, abnormal traffic, malware downloadMalware sandboxWeb proxyNetFlowFirewallIDS / IPSVulnerability scannerEndpoint Activity/SecurityWhat process is running (malicious, abnormal, etc.) Process owner, registry mods, attack/malware artifacts, patching level, attack susceptibilityDHCPDNSPatch mgmtEndpoint (AV/IPS/FW)ETDROS logsAuthorization – User/RolesAccess level, privileged users, likelihood of infection, where they might be in kill chainActive DirectoryLDAPCMDBOperating SystemDatabaseVPN, AAA, SSOThreats follow the steps at the top right -to-to enter an org and exfiltrate data. To spot this you need to connect the dots as they move through this process. To do this you need data from the 4 data source categories on the far left. Examples are to the right.Note – “malware sandbox” includes FireEye and Palo Alto Network’s Wildfire technology which detonates and web-based payloads and attachments and links in a virtual sandbox to see what they do & if they are malicious. Sometimes this category is also called “payload analysis” or “advanced malware detection”.ETDR is Endpoint Threat Detection and Response, an emerging category of next-gen endpoint technology. Cyvera (now part of Palo Alto Networks), Carbon Black (part of Bit9), RSA ECAT, Bromium, and Mandiant MIR fall into this category.Tell this slide perhaps as a “story” where you start with an alert at top (threat intel) and then pivot and use the other data sources to complete the investigation. See the appendix slide with a sample story.
23 Other SOC Technologies Advanced Incident Response ToolsTicketing/Case Management SystemPacket CaptureDisk ForensicsReverse Malware ToolsOther specialized tools are needed in a SOC. Other advanced tools for complex incident investigations. A ticketing system to hand off incidents among the SOC tiers.
24 Splunk Enterprise A Security Intelligence Platform
25 Splunk Gives Path to SOC Maturity Real-TimeRiskInsightProactiveSecurity Situational AwarenessProactive Monitoring and AlertingSearch and InvestigateTechnology that enhances all your SOC personnel and processesReactive
26 Splunk Can Complement an Existing SIEM Scenario 1Scenario 2Scenario 3INTEGRATIONNoneSplunk feeds SIEMSIEM feeds SplunkLOGGING& SIEMSIEMINVESTIGATIONS / FORENSICSCORRELATIONS / ALERTING / REPORTINGCOMPLIANCENOTESMay have different data sources going to Splunk vs SIEMSplunk typically sends just subset of its raw data to SIEMInitially, SIEM connectors are on too many hosts to be replacedIn scenario 1 the products are completely standalone. The SIEM alerts and the SOC analysts then walk over to Splunk for the deep investigation.In Scenario 2 it is Splunk feeding the SIEM. Usually the SOC analysts are comfortable with the UI and reports of the existing SIEM so want it in place for correlations/alerting/reporting. Splunk still used for deep investigations.In scenario 3 the existing SIEM feeds Splunk but all SOC use cases are done in Splunk. The existing SIEM is only in place because SIEM connectors to bring in data are on hundreds or thousands of hosts already so removing/replacing them is difficult. Usually with time the organization will start sending data from the sources directly to Splunk, often with the universal forwarder, and eventually the traditional SIEM is retired.
27 Splunk App for Enterprise Security Pre-built searches, alerts, reports, dashboards, workflow Dashboards and ReportsIncident Investigations & ManagementOver 45 pre-built searches37 predefined dashboards160 reportsSupporting common security metricsStatistical OutliersAsset and Identity Aware27
28 Key TakeawaysSOC requires investment in people, process and technologySplunk Enterprise is a security intelligence platform that can power your SOCSplunk software makes your SOC personnel and processes more efficient
29 Next Steps Splunk Security Advisory Services Help assess, build, implement, optimize a SOCIncludes people, process, and technologyCan include how to use Splunk within the SOCEvaluate Splunk Enterprise and the Splunk App for Enterprise Security
33 Ticketing Best Practices Plan Your QueuesThink of Automating EscalationsAttack/Incident Reports Are Your ReceiptHave in place strong ticketing/case management system.Think about queues and interaction with groups outside the SOC. If you need to hand a task to a different group keep in mind you may need to open a ticket on their system as well.Also determine how to receive tickets and when to open a ticketAutomating escalations is way in the security intelligence platform to automatically grab relevant data for the ticketAttack/Incident Reports is the ticket with all the detail
34 MSSP Model PROS CONS Around the Clock Lacks Agility Higher Visibility of the Threat LandscapeActionable AlertingDedicated SpecialtiesDoes not know your infrastructure
35 Whiteboard: Splunk SOC/ES Architecture Points:Build from previous architectureLayer in ES componentsCover ES Search Head – Function – SizingCover TAs – Function – BenefitsOffload Search load to Splunk Search HeadsAuto load-balanced forwarding to Splunk IndexersSend data from thousands of servers using any combination of Splunk forwarders
36 Merge the Entity And Adversary Models ControlsSSCMChefAuditTripwireADMonitorGraphingIntelExposureNmapNessusHighTripwireChefADMediumScansIntelLowNessusGraphingHighTripwireProxyMediumDNSRed TeamLowIDS/IPSOutboundReconNmapOSINTDeliveryProxyExploitationTripwireIDS/IPSC2DNSOutbound MonIntentRed Team
37 Example: Connecting the “data-dots” Delivery, Exploit InstallationGain Trusted AccessExfiltrationData GatheringUpgrade (Escalate) Lateral movementThreat IntelligenceAuth - User RolesHost Activity/SecurityNetwork Activity/SecurityBlacklisted IPBlacklisted IPMalware downloadContinued sessions during abnormal hours, periodicity, patterns, etc.Malware and endpoint execution dataSessions across different access points (web, remote control, tunneled)Program installationMachine dataTraffic dataAbnormal behaviorHigh confidence eventMed confidence eventLow confidence eventUser on machine,link to programand processMalware installAn example of an advanced threat. You need data from the 4 data source categories on the far left in order to connect the dots to see the full activity of the threat
38 Sample Job Description – Tier 2/3/CSIRT An example of an advanced threat. You need data from the 4 data source categories on the far left in order to connect the dots to see the full activity of the threat
39 Sample Job Description – Tier 1 SOC An example of an advanced threat. You need data from the 4 data source categories on the far left in order to connect the dots to see the full activity of the threat