Presentation is loading. Please wait.

Presentation is loading. Please wait.

MINDS: Data Mining Based Network Intrusion Detection System Vipin Kumar Army High Performance Computing Research Center University of.

Similar presentations


Presentation on theme: "MINDS: Data Mining Based Network Intrusion Detection System Vipin Kumar Army High Performance Computing Research Center University of."— Presentation transcript:

1 MINDS: Data Mining Based Network Intrusion Detection System Vipin Kumar Army High Performance Computing Research Center University of Minnesota Team Members: Eric Eilertson, Paul Dokas, Levent Ertoz, Ben Mayer, Aleksandar Lazarevic, Michael Steinbach, George Simon, Varun Chandola, Mark Shaneck, Jaideep Srivastava, Zhi-Li Zhang, Yongdae Kim, Vipin Kumar 1 AHPCRC

2 2 Information Assurance  Sophistication of cyber attacks and their severity is increasing  ARL, the Army, DOD and Other U.S. Government Agencies are major targets for sophisticated state sponsored cyber terrorists  Cyber strategies can be a major force multiplier and equalizer  Across DoD, computer assets have been compromised, information has been stolen, putting technological advantage and battlefield superiority at risk  Security mechanisms always have inevitable vulnerabilities  Firewalls are not sufficient to ensure security in computer networks  Insider attacks Spread of SQL Slammer worm 10 minutes after its deployment Incidents Reported to Computer Emergency Response Team/Coordination Center

3 3 AHPCRC Information Assurance Example of SNORT rule (MS-SQL “Slammer” worm) any -> udp port 1434 (content:"|81 F B 81 F1 01|"; content:"sock"; content:"send")  Intrusion Detection System –Combination of software and hardware that attempts to perform intrusion detection –Raises the alarm when possible intrusion happens Traditional intrusion detection system IDS tools are based on signatures of known attacks  Limitations –Signature database has to be manually revised for each new type of discovered intrusion –Substantial latency in deployment of newly created signatures across the computer system –They cannot detect emerging cyber threats –Not suitable for detecting policy violations and insider abuse –Do not provide understanding of network traffic –Generate too many false alarms

4 4 AHPCRC Data Mining for Intrusion Detection  Increased interest in data mining based intrusion detection – Attacks for which it is difficult to build signatures – Unforeseen/Unknown/Emerging attacks Misuse detection –Building predictive models from labeled labeled data sets (instances are labeled as “normal” or “intrusive”) to identify known intrusions –High accuracy in detecting many kinds of known attacks –Cannot detect unknown and emerging attacks Anomaly detection –Detect novel attacks as deviations from “normal” behavior –Potential high false alarm rate - previously unseen (yet legitimate) system behaviors may also be recognized as anomalies

5 5 AHPCRC Data Mining for Intrusion Detection Misuse Detection – Building Predictive Models categorical temporal continuous class Model Learn Classifier categorical Rules Discovered: {Src IP = , Dest Port = 139, Bytes  [150, 200]} --> {ATTACK} Rules Discovered: {Src IP = , Dest Port = 139, Bytes  [150, 200]} --> {ATTACK} Summarization of attacks using association rules Training Set Test Set Key Technical Challenges u Large data size u High dimensionality u Temporal nature of the data u Skewed class distribution u Data preprocessing u On-line analysis Anomaly Detection

6 6 AHPCRC Data Mining for Intrusion Detection categorical temporal continuous class Model Learn Classifier categorical Anomaly Detection Rules Discovered: {Src IP = , Dest Port = 139, Bytes  [150, 200]} --> {ATTACK} Rules Discovered: {Src IP = , Dest Port = 139, Bytes  [150, 200]} --> {ATTACK} Summarization of attacks using association rules Training Set Test Set Misuse Detection – Building Predictive Models Key Technical Challenges u Large data size u High dimensionality u Temporal nature of the data u Skewed class distribution u Data preprocessing u On-line analysis Anomaly Detection

7 7 AHPCRC MINDS – Minnesota INtrusion Detection System network Data capturing device Anomaly detection ………… Anomaly scores Human analyst Detected novel attacks Summary and characterization of attacks MINDS system Known attack detection Detected known attacks Labels Feature Extraction Association pattern analysis Filtering u Net flow tools u tcpdump u Data mining based intrusion detection system u Incorporated into Interrogator architecture at ARL Center for Intrusion Monitoring and Protection (CIMP) u Helps analyze data from multiple sensors at DoD sites around the country u MINDS anomalies are used as the primary key when viewing related alerts from other tools (SNORT, Jids, etc.) u MINDS is the first effective anomaly intrusion detection system used by ARL u Routinely detects attacks and intrusive behavior not detected by widely used intrusion detection systems u Insider Abuse / Policy Violations / Worms / Scans

8 8 AHPCRC Feature Extraction Module Three groups of features –Basic features of individual TCP connections source & destination IP - Features 1 & 2 source & destination port - Features 3 & 4 Protocol Feature 5 Duration Feature 6 Bytes per packets Feature 7 number of bytes Feature 8 –Time based features For the same source (destination) IP address, number of unique destination (source) IP addresses inside the network in last T seconds – Features 9 (13) Number of connections from source (destination) IP to the same destination (source) port in last T seconds – Features 11 (15) –Connection based features For the same source (destination) IP address, number of unique destination (source) IP addresses inside the network in last N connections - Features 10 (14) Number of connections from source (destination) IP to the same destination (source) port in last N connections - Features 12 (16)

9 9 AHPCRC Detection of Anomalies on Real Network Data u Anomalies/attacks picked by MINDS include scanning activities, worms, and non-standard behavior such as policy violations and insider attacks. Many of these attacks detected by MINDS, have already been on the CERT/CC list of recent advisories and incident notes. u Some illustrative examples of intrusive behavior detected using MINDS at U of M Scans –Detected scanning for Microsoft DS service on port 445/TCP Undetected by SNORT since the scanning was non-sequential (very slow). Rule added to SNORT in September 2002 –Detected scanning for Oracle server Undetected by SNORT because the scanning was hidden within another Web scanning –Detected a distributed windows networking scan from multiple source locations Policy Violations –Identified machine running Microsoft PPTP VPN server on non-standard ports Undetected by SNORT since the collected GRE traffic was part of the normal traffic –Identified compromised machines running FTP servers on non-standard ports, which is a policy violation Example of anomalous behavior following a successful Trojan horse attack –Detected computers on the network apparently communicating with outside computers over a VPN or on IPv6 Worms –Detected several instances of slapper worm that were not identified by SNORT since they were variations of existing worm code –Detected unsolicited ICMP ECHOREPLY messages to a computer previously infected with Stacheldract worm (a DDos agent)

10 –January 26, 2003 (48 hours after the “slammer” worm) MINDSMINDS  Anomalous connections that correspond to the “slammer” worm  Anomalous connections that correspond to the ping scan  Connections corresponding to UM machines connecting to “half-life” game servers Typical Anomaly Detection Output

11 11 AHPCRC Summarization Using Association Patterns Anomaly Detection System attack normal R1: TCP, DstPort=1863  Attack … R100: TCP, DstPort=80  Normal Discriminating Association Pattern Generator 1.Build normal profile 2.Study changes in normal behavior 3.Create attack summary 4.Detect misuse behavior 5.Understand nature of the attack update Knowledge Base Ranked connections

12 12 AHPCRC Typical MINDS Output  UM computer connecting to a remote FTP server, running on port 5002  Summarized TCP reset packets received from X.74, which is a victim of DoS attack, and we were observing backscatter, i.e. replies to spoofed packets  Summarization of FTP scan from a computer in Columbia, X.2  Summary of IDENT lookups, where a remote computer tries to get user name  Summarization of a USENET server transferring a large amount of data

13 13 AHPCRC Typical MINDS Output  UM computers doing bulk transfers  Attack on Real-Media server (Reported by CERT on September 9, 2003, RealNetworks media server RTSP protocol parser buffer overflow)  8200/tcp traffic related to gotomypc.com which allows users to remotely control a desktop (involves a third party)  Mysterious traffic currently being investigated

14 14 AHPCRC Typical MINDS Output  UMN computers doing bulk transfers  is running a rogue FTP server on 60000/TCP  UMN Computers doing large transfers via BitTorrent to many outside hosts  This computer is scanning for computers on port 139/TCP. Majority of the packets are 192bytes or 144bytes, except for the second summary (score 88.2)  UMN computer running a RealMedia server, that was not known to the analyst  Odd looking P2P traffic to/from a UMN computer (potentially KaZaA or Gnutella)  The remote computer was scanning for 57/TCP, where RESET packets are sent back from computers that do not have 57/TCP open.

15 15 AHPCRC Scan Detection Despite the importance of scan detection its value is often overlooked –Lack of good tools for scan detection Existing methods either miss stealth scans or give too many false alarms Fast scans are easy to catch using existing schemes but stealth scans are very difficult to recognize MINDS employs our new methodology for detecting network scans –Makes use of powerful new heuristics Only considers flows with a small number of packets Only considers scans in a subnet (not the whole internet) –Makes effective use of usage information Touches to rare IP / port combinations are more suspicious than others A scanner will hit machines where the service is not available resulting in a low count Very low False Alarm rate –Evaluation of 36 million flows over a 30-minute window at the University of Minnesota showed 2583 alarms but only 22 false alarms –Evaluation on an hour of data at the ARL showed 1150 scans report, but only 5 false alarms Routinely finds compromised machines at ARL-CIMP

16 16 AHPCRC Detecting Suspicious Ports for Possible Worm Activity We find destinations located within the network for which there is a high connection failure rate on specific ports for inbound, non-scan connections Then we find ports on which there are many such destinations The existence of these ports indicates a potential worm or slow scan This warrants targeted and more detailed data collection and analysis that cannot be done easily on the entire data –Packet content analysis –Signature generation

17 17 AHPCRC IP / port pairs for which a large percentage of connections failed

18 18 AHPCRC IP / port pairs for which a large percentage of connections failed (only for ports with many hits)

19 19 AHPCRC

20 999 unique sources (Min:1, Max:28, Avg:1) 1126 unique destinations (Min:1, Max:55, Avg:1) 1516 total flows involved 1472 scan flows on port 80 (found by scan detector)

21

22 7982 unique sources (Min:1, Max:16, Avg:1) 6184 unique destinations (Min:1, Max:28, Avg:1) 9930 total flows involved 9406 scan flows on port 445 (found by scan detector)

23

24 24 AHPCRC Clustering Useful for detecting modes of behavior –Shared Nearest Neighbor (SNN) clustering works quite well at determining modes of behavior Not distracted by “noise” in the data SNN is CPU intensive, O(N^2) Requires storing an N x K matrix –K (number of neighbors) is typically between 10 – 20 –K should be about the size of the smallest expect mode Clustered 850,000 connections collected over one hour at one US Army Fort Took 10 hours using 3 Quad 2.8 Ghz Servers, and 4 2 Ghz workstations (total of 16 CPUs) Required around 100 Meg of memory per PE for the distance calculations –500 Meg of memory for the final clustering step on a single PE Found 3135 clusters –Largest clusters around 500 records, smallest cluster 10 records

25 Detecting Large Modes of Network Traffic Using Clustering  Large clusters of VPN traffic (hundreds of connections)  Used between forts for secure sharing of data and working remotely

26 Detecting Unusual Modes of Network Traffic Using Clustering  Clusters Involving GoToMyPC.com (Army Data)  Policy violation, allows remote control of a desktop

27 Detecting Unusual Modes of Network Traffic Using Clustering  Clusters involving mysterious ping and SNMP traffic

28 Detecting Unusual Modes of Network Traffic Using Clustering  Clusters involving unusual repeated ftp sessions  Further investigations revealed a misconfigured Army computer was trying to contact Microsoft

29 Header Analysis Packet-Based Signature Detection Session-Based Signature Detection Simple Scans Viruses and Worms Scans with Automatic Virus Attacks Scans with Target Responses New and Variant Attacks Compromises Behavior Analysis (MINDS) Anomaly Detection and New Attacks MINDS: CRITICAL TO COMPLETE FUNCTIONALITY Army Research Laboratory (ARL), supported by the AHPCRC and the MINDS initiative, successfully monitors and analyzes network data to protect ARL and its Army and DoD customer infospace

30 30 AHPCRC Correlation of suspicious events across network sites –Helps detect sophisticated attacks not identifiable by single site analyses –Scalable anomaly detection –Distributed correlation algorithms –Grids & middleware Analysis of long term data (months/years) –Uncover suspicious stealth activities (e.g. insiders leaking/modifying information) MINDSMINDS MINDSMINDS MINDSMINDS MINDSMINDS MINDSMINDS Current MINDS Research and Development Work


Download ppt "MINDS: Data Mining Based Network Intrusion Detection System Vipin Kumar Army High Performance Computing Research Center University of."

Similar presentations


Ads by Google