Real-time Ingestion of telemetry into Hadoop to respond to Zero-Day Attacks Vipul Sawant, Pallav Jakhotiya
Telemetry
Sputnik First man-made satellite Launched by Soviet Union in 1957 Altitude 500 km above the earth Transmitted telemetry as “beeps”
FIREWALLENDPOINT SERVER GATEWAY Blocked /web attacks metadata Source server identity Web connection history In/outbound attachments Gateway security settings Blocked attacks Network connections Successful / failed logins Process behaviors Compliance status (PCI, HIPAA) Server security settings Blocked attacks Network connections Successful / failed logins Sensitive documents accessed Process behaviors Endpoint security settings Blocked connections Inbound/outbound traffic Protocol tunneling activity Data ingress/egress volumes Accesses to cloud apps Firewall software settings Logs/Telemetry in the security world
Engineering Challenges Variety Structured and Unstructured formats Volume Terabytes of data generated every day Velocity Millions of events generated per second Latency Act on the events in near real time
TechniqueThroughputRemarks Streaming100K-110K events/second Works well for real time computations Batch230K-250K events/secondLatency of 3-4 hours before data is available Hybrid600K-620K events/secondUses streaming for computations and a periodic batch for ingestion Analyzing Ingestion techniques Environment – 8 node shared test cluster running HDFS, Map Reduce, Storm, Kafka
Hybrid Ingestion Real Time Applications Message Broker Distributed Real time computation system Telemetry Gateway Telemetry Generators Batch Ingestion Batch Applications HDFS
So what problems can we solve with this type of data
Identify targeted attacks Machine X logged into 15 other machines in the last hour… lateral movement? So what problems could we solve with this type of data? Help companies scope attacks and recover The attackers initially compromised machine A, then pivoted to B and G. They accessed employee-list.doc, but did not access customer.doc Help companies protect their information Lily usually accesses 2-4 sensitive documents/day, but today she’s already accessed 19 confidential documents! Hmm.
Batch Application Responding to Zero-Day Attacks Telemetry Archive Login Attempts Application Versions Application usage Pattern Login Time Running Process AV Detections Extract Features Compute Co-relations Pre- Processing Real-Time Telemetry Rules Engine Possible Attack incidents Real-Time Application Decision Tree
CENTRAL SECURITY BIG DATA STORE Known C&C Server Implicated in Targeted Attack Company 2 Company 1 Server 8 FEB MON
“Seed” Indicator
Questions?
Thank You!