Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula.

Slides:



Advertisements
Similar presentations
Presented by Ben Serebin Tue, June 15, Every 2 nd Tuesday of the Month. Same Time and Place Visit for Presentation.
Advertisements

Sherlock – Diagnosing Problems in the Enterprise Srikanth Kandula Victor Bahl, Ranveer Chandra, Albert Greenberg, David Maltz, Ming Zhang.
WEB AND WIRELESS AUTOMATION connecting people and processes InduSoft Web Solution Welcome.
Network Systems Sales LLC
Kalpesh Vyas & Seward Khem
Computer networks Fundamentals of Information Technology Session 6.
CPSC Network Layer4-1 IP addresses: how to get one? Q: How does a host get IP address? r hard-coded by system admin in a file m Windows: control-panel->network->configuration-
11 TROUBLESHOOTING Chapter 12. Chapter 12: TROUBLESHOOTING2 OVERVIEW  Determine whether a network communications problem is related to TCP/IP.  Understand.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 4 Installing and Configuring the Dynamic Host Configuration Protocol.
Lecture 2: Servers and Services Network Design & Administration.
Module 5: Configuring Access for Remote Clients and Networks.
An Empirical Study of Real Audio Traffic A. Mena and J. Heidemann USC/Information Sciences Institute In Proceedings of IEEE Infocom Tel-Aviv, Israel March.
The Application Layer Chapter 7. Where are we now?
CLIENT / SERVER ARCHITECTURE AYRİS UYGUR & NİLÜFER ÇANGA.
1 © 2003, Cisco Systems, Inc. All rights reserved. CCNA 1 v3.0 Module 11 TCP/IP Transport and Application Layers.
Wi-Fi Structures.
1. Introduction The underground Internet economy Web-based malware The system analyzing the post-infection network behavior of web-based malware How do.
FIREWALL TECHNOLOGIES Tahani al jehani. Firewall benefits  A firewall functions as a choke point – all traffic in and out must pass through this single.
TCP Sockets Reliable Communication. TCP As mentioned before, TCP sits on top of other layers (IP, hardware) and implements Reliability In-order delivery.
Chapter Eleven An Introduction to TCP/IP. Objectives To compare TCP/IP’s layered structure to OSI To review the structure of an IP address To look at.
Lucent Technologies – Proprietary Use pursuant to company instruction Learning Sequential Models for Detecting Anomalous Protocol Usage (work in progress)
CN2668 Routers and Switches Kemtis Kunanuraksapong MSIS with Distinction MCTS, MCDST, MCP, A+
11 NETWORK PROTOCOLS AND SERVICES Chapter 10. Chapter 10: Network Protocols and Services2 NETWORK PROTOCOLS AND SERVICES  Identify how computers on TCP/IP.
 Zhichun Li  The Robust and Secure Systems group at NEC Research Labs  Northwestern University  Tsinghua University 2.
Damian Leibaschoff Support Escalation Engineer Microsoft Becky Ochs Program Manager Microsoft.
Towards Highly Reliable Enterprise Network Services via Inference of Multi-level Dependencies Paramvir Bahl, Ranveer Chandra, Albert Greenberg, Srikanth.
Network Protocols. Why Protocols?  Rules and procedures to govern communication Some for transferring data Some for transferring data Some for route.
Chapter 10 Intro to Routing & Switching.  Upon completion of this chapter, you should be able to:  Explain how the functions of the application layer,
A+ Guide to Software: Managing, Maintaining, and Troubleshooting, 5e
OSP201 Security and complexity are often inversely proportional. Security and usability are often inversely proportional. Security is an investment,
1 Semester 2 Module 10 Intermediate TCP/IP Yuda college of business James Chen
70-291: MCSE Guide to Managing a Microsoft Windows Server 2003 Network Chapter 7: Domain Name System.
Fast Portscan Detection Using Sequential Hypothesis Testing Authors: Jaeyeon Jung, Vern Paxson, Arthur W. Berger, and Hari Balakrishnan Publication: IEEE.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Application Layer Functionality and Protocols.
Trend Micro Confidential 9/23/2015 Threat Rules Sharing Advanced Threats Research.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 2: TCP/IP Architecture.
Module 4: Planning, Optimizing, and Troubleshooting DHCP
DNS Security Pacific IT Pros Nov. 5, Topics DoS Attacks on DNS Servers DoS Attacks by DNS Servers Poisoning DNS Records Monitoring DNS Traffic Leakage.
Guide to Linux Installation and Administration, 2e1 Chapter 2 Planning Your System.

Hour 7 The Application Layer 1. What Is the Application Layer? The Application layer is the top layer in TCP/IP's protocol suite Some of the components.
Learning Rules for Anomaly Detection of Hostile Network Traffic Matthew V. Mahoney and Philip K. Chan Florida Institute of Technology.
1 © 2003, Cisco Systems, Inc. All rights reserved. CCNA 1 v3.0 Module 11 TCP/IP Transport and Application Layers.
Fundamentals of Proxying. Proxy Server Fundamentals  Proxy simply means acting on someone other’s behalf  A Proxy acts on behalf of the client or user.
Application Layer Khondaker Abdullah-Al-Mamun Lecturer, CSE Instructor, CNAP AUST.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin In First Workshop on Hot Topics in Understanding Botnets,
Data Communications and Networks Chapter 5 – Network Services DNS, DHCP, FTP and SMTP ICT-BVF8.1- Data Communications and Network Trainer: Dr. Abbes Sebihi.
Securing the Network Infrastructure. Firewalls Typically used to filter packets Designed to prevent malicious packets from entering the network or its.
TCP Sockets Reliable Communication. TCP As mentioned before, TCP sits on top of other layers (IP, hardware) and implements Reliability In-order delivery.
Networking Fundamentals. Basics Network – collection of nodes and links that cooperate for communication Nodes – computer systems –Internal (routers,
TCP/IP (Transmission Control Protocol / Internet Protocol)
Lesson 11: Configuring and Maintaining Network Security
IP addresses IPv4 and IPv6. IP addresses (IP=Internet Protocol) Each computer connected to the Internet must have a unique IP address.
Configuring and Troubleshooting Identity and Access Solutions with Windows Server® 2008 Active Directory®
Intrusion Detection Systems Paper written detailing importance of audit data in detecting misuse + user behavior 1984-SRI int’l develop method of.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public ITE PC v4.0 Chapter 1 1 Application Layer Functionality and Protocols.
1. Layered Architecture of Communication Networks: TCP/IP Model
Change Is Hard: Adapting Dependency Graph Models For Unified Diagnosis in Wired/Wireless Networks Lenin Ravindranath, Victor Bahl, Ranveer Chandra, David.
Machine Learning for Network Anomaly Detection Matt Mahoney.
Role Of Network IDS in Network Perimeter Defense.
Chapter 8.  Upon completion of this chapter, you should be able to:  Understand the purpose of a firewall  Name two types of firewalls  Identify common.
UDP: User Datagram Protocol. What Can IP Do? Deliver datagrams to hosts – The IP address in a datagram header identify a host – treats a computer as an.
Module 8: Networking Services
ETHANE: TAKING CONTROL OF THE ENTERPRISE
Working at a Small-to-Medium Business or ISP – Chapter 7
Working at a Small-to-Medium Business or ISP – Chapter 7
Working at a Small-to-Medium Business or ISP – Chapter 7
Allocating IP Addressing by Using Dynamic Host Configuration Protocol
Data Communications and Networks
Presentation transcript:

Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula

Network Admins. are Groping in the Dark Focus on Traffic Volume TCP=80%, HTTP=30% Adapt report categories (e.g., AutoFocus) – Much traffic from ports But, Whats Going On? Traffic follows plan? Misconfigurations Suspicious Traffic (Active) user browsing web, reading/sending mail (Automatic) SMS scan on a network, outlook refresh Besides focusing on volume, learn rules underlying the traffic

Infer the actual behavior of applications – AFS root servers direct traffic to volume servers evenly – mail to the incoming MX, is forwarded onto group MXes Notice misconfigurations and badness – these clients shld not be talking on known command-control ports this server shld not be responding to DHCP requests – this mail server shld not attempt connections to non-existent MXes flow Y flow X Whenever flow y happens, flow x is likely to occur Rule t X X X X Y Y Y If you could learn such rules directly from a trace, (http DNS)

Report all significant rules with no specific knowledge about a trace

Mining for Rules is Hard How to define significance? – When is a group of flows interesting enough to report? Avoid observer bias but cannot evaluate everything – Focus on one server, miss what you are not looking for Practical, deal with noise, search quickly eXpose 1.A scoring function for significance 2.Heuristics that bias search toward high hit-rate 3.Empirical validation on enterprise traces eXpose 1.A scoring function for significance 2.Heuristics that bias search toward high hit-rate 3.Empirical validation on enterprise traces

Overview Packet trace to Activity Matrix o Rows are 1s windows; Columns are flows o Is flow active in [time i-1, time i )? (at least one packet) Association rule mining (X,Y are r.v. for columns) Need not worry about interleaving Dependencies are at these time-scales (an rtt, a server response) Packet Trace flow 1 …flow K time 1 … time R Activity Matrix Rules All windows in [.25s, 2s] range yield similar rules

Which Rules are Significant? High Joint Probability? o X, Y may occur very often individually (e.g., breeze, sun shining) High Conditional Probability? o Say Y occurs only when X does, but both are rare (lottery, buy a jet) X Y

* Measures fraction of change in Y due to X High Joint Probability? High Conditional Probability? We use mutual information (combines the two) * Trades off dependency & frequency Score=0, if Y is independent of X Score=Max, if Y is fully dependent on X * Encodes Directionality Kerberos Reservation Which Rules are Significant? X Y

Negative Correlation – Flows with little overlap Y … X … P( Y|X) 1 leads to high score Modifying Scores for Networking

Negative Correlation – Flows with little overlap Long Running Flows – Large downloads, ssh/remote desktop – Trivial overlaps with long flow – Distinguish new vs. present – Present rules reported only if small mismatch in freq. Too Many Possibilities – Bias, focus on pairs with at least one common IP – Miss rules, but hit-rate up 1000x and costs down 10x Y … … Y … X … X P(Y|X) 1 Modifying Scores for Networking

Generics - Miss, if no client accesses server often + Rules that abstract away parts of a flow Server Database Client : Server Server : Database Reservation Kerberos Client : Server Server : Database * Client : Rsrv. Client : Kerberos * * (any client) (any client, but same on both sides) To do this automatically, what to abstract? (IP addresses at non-server port) which pairs to consider for rule? –flows match IP, generics match abstracted IP To do this automatically, what to abstract? (IP addresses at non-server port) which pairs to consider for rule? –flows match IP, generics match abstracted IP

Techniques extend to arbitrary sized rules Instead, 1.Focus on pair-wise rules (simpler is likelier) 2.Group similar rules – Eliminate weak rules between strongly connected groups – Transitive closure to read off clusters Rule Mining Mining for Rules O(f 2 )O(f n+1 ) RuleScore Recursive Spectral Partitioning (VKV00) Digests flows into rule clusters

… flow i.new flow j.present... Packet Trace flow 1 …flow K time 1 present |new … time R Activity Matrix Rules Recap: eXpose Mines for Rules Learn all significant rules without prior knowledge oScoring function for rule significance oAvoids observer bias, yet stays feasible by focusing on high hit-rate oAlgorithms to mine and prune Rule Clusters Contributions

Related Work Semi-Automated Discovery of App. Session Structure (KJPK06) Sherlock (Diagnosing Performance Problems, BCGKMZ07) Autofocus (ESV03) BLINC (KPF05) Stepping Stones (ZP00) Learn all significant rules without prior knowledge oAvoids observer bias, yet stays feasible by focusing on high hit-rate oScoring function for rule significance oAlgorithms to mine and prune

Results

Evaluation Setup Traces at access and internal server-facing links – Packet Headers, Connection Records (Bro), some anon. Operational n/w with 10 3 clients, diverse traffic mix Corroborated on test-bed traffic & vetted by admins. Ran eXpose on a 2.4GHz x86 with 8GB RAM Inside MicrosoftBefore CSAILs Servers Access Link of Conf. LANsCSAILs Access

Dependencies for Major Applications Rules Discovered by eXpose Client.* – Mail.135 Client.* – DC.88Client.* – Mail.X Client.* – PFS 1.XClient.* – PFS 2.XClient.* – Proxy.80 microsoft

Rules Discovered by eXpose Dependencies for Major Applications csail C.7001 – Root.7003 C.7001 – *.* C.7001 – AFS C.7001 – AFS AFS – Root.7002

Rules Discovered by eXpose Dependencies for Major Applications – web, , file-servers, IM, print, video broadcast microsoft Proxy1.80 – *.* Proxy2.80 – *.* Proxy3.80 – *.* Proxy4.80 – *.*

Rules Discovered by eXpose Dependencies for Major Applications – web, , file-servers, IM, print, video broadcast Configuration Errors & Other Badness Client.* – MailServer.25 Client.113 – MailServer.* smtp + csail

Dependencies for Major Applications – web, , file-servers, IM, print, video broadcast Configuration Errors & Other Badness – IDENT, Legacy s, ssh scans, wingate Rules Discovered by eXpose Legacy csail UnivMail.* – Old2.25 UnivMail.* – Old1.25 UnivMail.* – Old3.25

Rules Discovered by eXpose Dependencies for Major Applications – web, , file-servers, IM, print, video broadcast Configuration Errors & Other Badness – IDENT, Legacy s, ssh scans, wingate Rules for stuff we didnt know before Nagios csail Nagios.7001 – AFS Nagios.7001 – AFS Nagios.* – Mail2.25 Nagios.* – Mail1.25

Rules Discovered by eXpose Dependencies for Major Applications – web, , file-servers, IM, print, video broadcast Configuration Errors & Other Badness – IDENT, Legacy s, ssh scans, wingate Rules for stuff we didnt know before – Nagios, LLMNR, iTunes Link level multicast name hotspots H.* – DNS.53 H.137 – Wins.137 H.* – Multicast.5355 Black box: Little prior knowledge about servers, applications, or users Can evolve

Correctness & Completeness False Positives – 13% of rule-clusters in CSAIL trace, we couldnt explain False Negatives – Main CSAIL Web Server (too many different activities) – Dependencies on Personal Web Pages (too few traffic) – PlanetLab Traffic (punted) Other Limitations – IPSec, Anonymized, Cover Traffic Extensions – Rules repeat over time, and across traces – Application whitelisting, Customize Generics

Time to Mine for Rules At CSAILs access link, high fan-out with many distinct flows Stream Mining Appears Feasible! # Flows (x 10 6 )

Packet Trace Rules for frequently reoccurring flow sets Learn all significant rules with no specific knowledge oAvoids observer bias, but feasible by focusing on high hit-rate oScoring function for rule significance oAlgorithms to mine and prune Empirical validation on enterprise traces found configurations & protocols that we didnt know existed learnt rules for actual behavior of applications found config. errors, bot scans, infected machines eXpose

Backup

Rule Score (Modified JMeasure) # of Discovered Rules Expanding Search Space (# of flows)… … exposes few significant rules!

Expanding Search Space (# of flows)… # Top Active Flows Time to Mine Rules (s) Memory Footprint (million rules) … exposes few rules & costs a lot in time, memory

Varying Size of Time Windows # of Discovered Rules Rule Score (Modified JMeasure) All window sizes in [.25s, 2s] produce similar rules!

For all rules X Y Prob. (X)Prob. (Y) Joint Probability