Vulnerability Management at Scale

Vulnerability Management at Scale
Intro: who I am, what I do, where I work. Vulnerability Management at Scale Alexandre Fiori Production Engineer

Talk about landscape: different infrastructure, services, and products of each app; and their build/release systems. Facebook: prod, facilities, corp, EE WhatsApp, Messenger, Instagram: prod, mobile Oculus: prod, mobile, embedded

Overview What’s this about The evolution of the Vulnerability Management program at Facebook, the birth of the Production Engineering team to support the program, and the pragmatic approach to build a highly scalable system for an ever-growing environment.

“Facebook was built on an open source stack
“Facebook was built on an open source stack. We support and encourage the use and development of open source software and hardware” Talk about the pros and cons of open source with regards to security and vulnerabilities. Pros: public security bulletins and patches Cons: security information impacting business

Vulnerability Management Timeline
…2015: PCI DSS 2016: Internal scanning tools 2017: Broader network scanning 2018: Reboot, build up 2019: VMaaS Take time to go through each bullet point and their short story. “Payment Card Industry Data Security Standard” 2015: PCI requires a Vulnerability Management Program to exist 2016: vulnmetrics, aquilles CVE matching 2017: Extending network scanner infra to EE 2018: styx and new pipelines

Tell the story of 2018 onwards. How the reboot of the program happened.
Expand the program Bootstrap new team from 2 ICs and 1 TPM reporting to director 2018

New challenge Scan all infrastructure Merge scanning technologies
Improve vulnerability matching Start and track remediation Manage remediation lifecycle Design and build system to support all companies XFN with teams working in silos, merging tech Hire proprietary vulnerability database companies Train security operations team to manage remediation

Architecture

Mindset Big data Scalability and reliability
On-line vs off-line pipelines Concept validation and XFN Fast prototype and launch Volume: too much data everywhere; consolidation How to scale the system vs people, add features, maintain SLA On-line traumas from the detection pipelines: rollout, backfill Batches are easier to set up, run, test, scale, and maintain (backfills, etc) Validating concept through XFN, finding early adopters – talk to people

Talk about the concept of extract, transform, load and the parallels with the UNIX philosophy.
ETL

Collect Process Report
Talk about the concept of running batch-oriented pipelines, e.g. daily. Collectors are stateless Processors may be stateless or stateful (e.g. need vulndb, decorators for domain and first seen) Reporters are stateful – need data from previous runs to proceed (e.g. escalate, cleanup)

Collectors Inventory applications operating systems hardware Scan asset inventories and print to standard output

Processors Aggregate Scan Vulnerabilities Normalize
Pre-process, scan vulnerabilities, post-process

Reporters Validate Escalate Notify Track Cleanup Done
Reporters manage escalation lifecycle

Inventory Classes Hosted software Installed software Running software
Network scanners Hardware Network scanners category already existed but was not integrated Hosted software and installed software came first

Vulnerability Database
Public vs Proprietary Multi-vendor system General purpose datasets Specialized for ecosystem Standard format for product->vulnerability matching Proprietary: licensing, restricted use General purpose databases: NVD OS-oriented: Apple, RedHat (IBM?), Microsoft Language / ecosystem: npm, pypi, NuGet Specialized per ecosystem also includes first-party: our own CVEs

Implementation

Design Principles Command line tools do one thing
Communicate over a text interface Core functionality shared as libraries Composable code, tools, and pipelines Rely on well established UNIX conventions

Industry Standard Technology
MITRE / NIST / NVD Common Platform Enumeration Common Vulnerabilities and Exposures Common Weakness Enumeration Common Vulnerability Scoring System

Infrastructure Tools Services Data warehouse Dashboards Notifications
Tools: our own, internal, open source, generic: grep, sort, jq Services: Tupperware tailer, vulndb thrift service Using the data warehouse to leverage orchestration, shared filesystem, persistent storage Data warehouse provides operational dashboards Our system’s data exposed to partners/customers via tables and dashboards Notifications include: cases, tasks, alerts

What’s missing in this diagram:
Internal shenanigans for Tupperware and fbpkg Post-processing and remediation pipelines $repo2csv: maven, munki, choco, yum/repoquery

Internal Pipelines Tupperware Container images and packages
First-party vs third-party codebase Bad Binary Hunter and Buck Attribution from package to service None of this is mentioned in the diagram slide. Tupperware pipeline uses a tailer + batch job to query all schedulers Another pipeline scan internal packages for traces of third-party dependencies Internal packages are shipped to infra containing third-party code into it BBH scans internal packages using Buck, report third-party dependencies Internal packages are flagged with vulnerabilities from third-party dependencies

Vulnerability Database Tools
nvdsync and $vendor2nvd vulndb command line tool Uses NVD CVE JSON 1.0 format Manages versioned datasets backed by MySQL Supports vendor snapshots, custom CVEs, and snoozes vulndb thrift service CVE lookup and CPE matching nvdsync is open source, an rsync-like for NVD datasets $vendor2nvd download vendor databases and convert to NVD CVE JSON 1.0 format vulndb stores NVD datasets vulndb can be open sourced Databases organized by vendor Import vendor snapshots, export merged datasets with custom CVEs (custom edits/fixes) Store snoozes used by post-processors (well, grep -vf) to snooze certain CVEs per provider and collector Thrift service to support UIs and on-demand CVE matching (e.g. vulnquery, “linters”)

Decoration CWE and CVSS Domain and sub-domain First seen
Backlog vs influx Owner (on-call ID or UNIX username) Threat Intelligence Decoration is post-processing.

Remediation Starts from CVE inventory Depends on decoration data
Supports feedback loop (e.g. snoozes) Understands release cycles per inventory class Manages escalation and lifecycle

Lessons Learned Normalization, aggregation, and blackholes
Per-customer decision trees are burdensome Handling delays, XFN work, and fine tuning Tasks and notification updates are annoying False positives can compromise credibility

2019: VMaaS

Goals Self-service system for vulnerability scan
Custom aggregation defined by collectors Configurable providers and thresholds Tier-based service, starting from bronze Default dashboards and reports per tier Common CVE inventory for all customers

Progress XFN partnership with early adopters
Migrated “hosted software” inventory class Total of 10+ collectors in operation Customers fixing bogus CVEs in the database Snoozes effectively helping fine-tune reports

Next up Migrate other inventory classes
Improve data quality and detection speed Tackle backlog via remediation campaigns Tackle influx via push-blocking scans Influence company culture outside security org Migration: brainstorming graph / CPE attribution Data quality: better ranking system and risk analysis Campaigns to tackle backlog *and* emergencies Influx: linters* or push-blocking tests that do scans Influence culture: educate through bootcamp, support groups, campaigns

Thank you

Vulnerability Management at Scale

Similar presentations

Presentation on theme: "Vulnerability Management at Scale"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Vulnerability Management at Scale

Similar presentations

Presentation on theme: "Vulnerability Management at Scale"— Presentation transcript:

Similar presentations

About project

Feedback