Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula.

Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula

Network Admins. are Groping in the Dark Focus on Traffic Volume TCP=80%, HTTP=30% Adapt report categories (e.g., AutoFocus) – Much traffic from ports 500-600 But, Whats Going On? Traffic follows plan? Misconfigurations Suspicious Traffic (Active) user browsing web, reading/sending mail (Automatic) SMS scan on a network, outlook refresh Besides focusing on volume, learn rules underlying the traffic

Infer the actual behavior of applications – AFS root servers direct traffic to volume servers evenly – mail to the incoming MX, is forwarded onto group MXes Notice misconfigurations and badness – these clients shld not be talking on known command-control ports this server shld not be responding to DHCP requests – this mail server shld not attempt connections to non-existent MXes flow Y flow X Whenever flow y happens, flow x is likely to occur Rule t X X X X Y Y Y If you could learn such rules directly from a trace, (http DNS)

Report all significant rules with no specific knowledge about a trace

Mining for Rules is Hard How to define significance? – When is a group of flows interesting enough to report? Avoid observer bias but cannot evaluate everything – Focus on one server, miss what you are not looking for Practical, deal with noise, search quickly eXpose 1.A scoring function for significance 2.Heuristics that bias search toward high hit-rate 3.Empirical validation on enterprise traces eXpose 1.A scoring function for significance 2.Heuristics that bias search toward high hit-rate 3.Empirical validation on enterprise traces

Overview Packet trace to Activity Matrix o Rows are 1s windows; Columns are flows o Is flow active in [time i-1, time i )? (at least one packet) Association rule mining (X,Y are r.v. for columns) Need not worry about interleaving Dependencies are at these time-scales (an rtt, a server response) Packet Trace flow 1 …flow K time 1 … time R Activity Matrix Rules All windows in [.25s, 2s] range yield similar rules

Which Rules are Significant? High Joint Probability? o X, Y may occur very often individually (e.g., breeze, sun shining) High Conditional Probability? o Say Y occurs only when X does, but both are rare (lottery, buy a jet) X Y

* Measures fraction of change in Y due to X High Joint Probability? High Conditional Probability? We use mutual information (combines the two) * Trades off dependency & frequency Score=0, if Y is independent of X Score=Max, if Y is fully dependent on X * Encodes Directionality Kerberos Reservation Which Rules are Significant? X Y

Negative Correlation – Flows with little overlap Y … X … P( Y|X) 1 leads to high score Modifying Scores for Networking

Negative Correlation – Flows with little overlap Long Running Flows – Large downloads, ssh/remote desktop – Trivial overlaps with long flow – Distinguish new vs. present – Present rules reported only if small mismatch in freq. Too Many Possibilities – Bias, focus on pairs with at least one common IP – Miss rules, but hit-rate up 1000x and costs down 10x Y … … Y … X … X P(Y|X) 1 Modifying Scores for Networking

Generics - Miss, if no client accesses server often + Rules that abstract away parts of a flow Server Database Client : Server Server : Database Reservation Kerberos Client : Server Server : Database * Client : Rsrv. Client : Kerberos * * (any client) (any client, but same on both sides) To do this automatically, what to abstract? (IP addresses at non-server port) which pairs to consider for rule? –flows match IP, generics match abstracted IP To do this automatically, what to abstract? (IP addresses at non-server port) which pairs to consider for rule? –flows match IP, generics match abstracted IP

Techniques extend to arbitrary sized rules Instead, 1.Focus on pair-wise rules (simpler is likelier) 2.Group similar rules – Eliminate weak rules between strongly connected groups – Transitive closure to read off clusters Rule Mining Mining for Rules O(f 2 )O(f n+1 ) RuleScore Recursive Spectral Partitioning (VKV00) Digests 10 5 10 6 flows into 10 2 10 3 rule clusters

… flow i.new flow j.present... Packet Trace flow 1 …flow K time 1 present |new … time R Activity Matrix Rules Recap: eXpose Mines for Rules Learn all significant rules without prior knowledge oScoring function for rule significance oAvoids observer bias, yet stays feasible by focusing on high hit-rate oAlgorithms to mine and prune Rule Clusters Contributions

Related Work Semi-Automated Discovery of App. Session Structure (KJPK06) Sherlock (Diagnosing Performance Problems, BCGKMZ07) Autofocus (ESV03) BLINC (KPF05) Stepping Stones (ZP00) Learn all significant rules without prior knowledge oAvoids observer bias, yet stays feasible by focusing on high hit-rate oScoring function for rule significance oAlgorithms to mine and prune

Results

Evaluation Setup Traces at access and internal server-facing links – Packet Headers, Connection Records (Bro), some anon. Operational n/w with 10 3 clients, diverse traffic mix Corroborated on test-bed traffic & vetted by admins. Ran eXpose on a 2.4GHz x86 with 8GB RAM Inside MicrosoftBefore CSAILs Servers Access Link of Conf. LANsCSAILs Access

Dependencies for Major Applications Rules Discovered by eXpose Client.* – Mail.135 Client.* – DC.88Client.* – Mail.X Client.* – PFS 1.XClient.* – PFS 2.XClient.* – Proxy.80 email @ microsoft

Rules Discovered by eXpose Dependencies for Major Applications afs @ csail C.7001 – Root.7003 C.7001 – *.* C.7001 – AFS1.7000 C.7001 – AFS2.7000 AFS1.7000 – Root.7002

Rules Discovered by eXpose Dependencies for Major Applications – web, e-mail, file-servers, IM, print, video broadcast web @ microsoft Proxy1.80 – *.* Proxy2.80 – *.* Proxy3.80 – *.* Proxy4.80 – *.*

Rules Discovered by eXpose Dependencies for Major Applications – web, e-mail, file-servers, IM, print, video broadcast Configuration Errors & Other Badness Client.* – MailServer.25 Client.113 – MailServer.* smtp + IDENT @ csail

Dependencies for Major Applications – web, e-mail, file-servers, IM, print, video broadcast Configuration Errors & Other Badness – IDENT, Legacy emails, ssh scans, wingate Rules Discovered by eXpose Legacy email ids @ csail UnivMail.* – Old2.25 UnivMail.* – Old1.25 UnivMail.* – Old3.25

Rules Discovered by eXpose Dependencies for Major Applications – web, e-mail, file-servers, IM, print, video broadcast Configuration Errors & Other Badness – IDENT, Legacy emails, ssh scans, wingate Rules for stuff we didnt know before Nagios monitors @ csail Nagios.7001 – AFS1.7000 Nagios.7001 – AFS2.7000 Nagios.* – Mail2.25 Nagios.* – Mail1.25

Rules Discovered by eXpose Dependencies for Major Applications – web, e-mail, file-servers, IM, print, video broadcast Configuration Errors & Other Badness – IDENT, Legacy emails, ssh scans, wingate Rules for stuff we didnt know before – Nagios, LLMNR, iTunes Link level multicast name resolution @ hotspots H.* – DNS.53 H.137 – Wins.137 H.* – Multicast.5355 Black box: Little prior knowledge about servers, applications, or users Can evolve

Correctness & Completeness False Positives – 13% of rule-clusters in CSAIL trace, we couldnt explain False Negatives – Main CSAIL Web Server (too many different activities) – Dependencies on Personal Web Pages (too few traffic) – PlanetLab Traffic (punted) Other Limitations – IPSec, Anonymized, Cover Traffic Extensions – Rules repeat over time, and across traces – Application whitelisting, Customize Generics

Time to Mine for Rules At CSAILs access link, high fan-out with many distinct flows Stream Mining Appears Feasible! # Flows (x 10 6 ).6.2.6.9 2.8

Packet Trace Rules for frequently reoccurring flow sets Learn all significant rules with no specific knowledge oAvoids observer bias, but feasible by focusing on high hit-rate oScoring function for rule significance oAlgorithms to mine and prune Empirical validation on enterprise traces found configurations & protocols that we didnt know existed learnt rules for actual behavior of applications found config. errors, bot scans, infected machines eXpose http://research.microsoft.com/~srikanth

Backup

Rule Score (Modified JMeasure) # of Discovered Rules Expanding Search Space (# of flows)… … exposes few significant rules!

Expanding Search Space (# of flows)… # Top Active Flows Time to Mine Rules (s) Memory Footprint (million rules) … exposes few rules & costs a lot in time, memory

Varying Size of Time Windows # of Discovered Rules Rule Score (Modified JMeasure) All window sizes in [.25s, 2s] produce similar rules!

For all rules X Y Prob. (X)Prob. (Y) Joint Probability

Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula.

Similar presentations

Presentation on theme: "Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula.

Similar presentations

Presentation on theme: "Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula."— Presentation transcript:

Similar presentations

About project

Feedback