Where Are the Nuggets in System Audit Data? Wenke Lee College of Computing Georgia Institute of Technology.

Slides:



Advertisements
Similar presentations
Intrusion Detection Systems (I) CS 6262 Fall 02. Definitions Intrusion Intrusion A set of actions aimed to compromise the security goals, namely A set.
Advertisements

Loss-Sensitive Decision Rules for Intrusion Detection and Response Linda Zhao Statistics Department University of Pennsylvania Joint work with I. Lee,
1 VLDB 2006, Seoul Mapping a Moving Landscape by Mining Mountains of Logs Automated Generation of a Dependency Model for HUG’s Clinical System Mirko Steinle,
Intrusion Detection Systems. Tecniche di Sicurezza dei Sistemi2 Intrusion Detection Systems Presently there is much interest in systems, which can detect.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Supporting clinical professionals in the decision-making for patients with chronic diseases Mitja Luštrek 1, Božidara Cvetković 1, Maurizio Bordone 2,
Service Discrimination and Audit File Reduction for Effective Intrusion Detection by Fernando Godínez (ITESM) In collaboration with Dieter Hutter (DFKI)
Data Mining and Intrusion Detection
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Intrusion Detection/Prevention Systems. Objectives and Deliverable Understand the concept of IDS/IPS and the two major categorizations: by features/models,
Week 9 Data Mining System (Knowledge Data Discovery)
Automatic Discovery and Classification of search interface to the Hidden Web Dean Lee and Richard Sia Dec 2 nd 2003.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Extraction of high-level features from scientific data sets Eui-Hong (Sam) Han Department of Computer Science and Engineering University of Minnesota Research.
Copyright 2002, Center for Secure Information Systems 1 Panel: Role of Data Mining in Cyber Threat Analysis Professor Sushil Jajodia Center for Secure.
Application of Association Rules in Intrusion Detection Xiangyang Li Dept. Industrial Engineering ASU.
Ensemble-based Adaptive Intrusion Detection Wei Fan IBM T.J.Watson Research Salvatore J. Stolfo Columbia University.
Mining Behavior Models Wenke Lee College of Computing Georgia Institute of Technology.
Intrusion Detection/Prevention Systems. Definitions Intrusion –A set of actions aimed to compromise the security goals, namely Integrity, confidentiality,
seminar on Intrusion detection system
Intrusion Detection Systems. Definitions Intrusion –A set of actions aimed to compromise the security goals, namely Integrity, confidentiality, or availability,
Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By.
Intrusion Detection Systems. Definitions Intrusion –A set of actions aimed to compromise the security goals, namely Integrity, confidentiality, or availability,
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Department Of Computer Engineering
Intrusion Detection System Marmagna Desai [ 520 Presentation]
INTRUSION DETECTION SYSTEM
Building Survivable Systems based on Intrusion Detection and Damage Containment Paper by: T. Bowen Presented by: Tiyseer Al Homaiyd 1.
Scientific Computing Department Faculty of Computer and Information Sciences Ain Shams University Supervised By: Mohammad F. Tolba Mohammad S. Abdel-Wahab.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Overview of Distributed Data Mining Xiaoling Wang March 11, 2003.
Intrusion and Anomaly Detection in Network Traffic Streams: Checking and Machine Learning Approaches ONR MURI area: High Confidence Real-Time Misuse and.
1. Introduction Generally Intrusion Detection Systems (IDSs), as special-purpose devices to detect network anomalies and attacks, are using two approaches.
Data Mining for Intrusion Detection: A Critical Review Klaus Julisch From: Applications of data Mining in Computer Security (Eds. D. Barabara and S. Jajodia)
Knowledge Acquisition from Game Records Takuya Kojima, Atsushi Yoshikawa Dept. of Computer Science and Information Engineering National Dong Hwa University.
Chirag N. Modi and Prof. Dhiren R. Patel NIT Surat, India Ph. D Colloquium, CSI-2011 Signature Apriori based Network.
CS490D: Introduction to Data Mining Prof. Chris Clifton April 14, 2004 Fraud and Misuse Detection.
A Statistical Anomaly Detection Technique based on Three Different Network Features Yuji Waizumi Tohoku Univ.
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Network Intrusion Detection Using Random Forests Jiong Zhang Mohammad Zulkernine School of Computing Queen's University Kingston, Ontario, Canada.
Improving Intrusion Detection System Taminee Shinasharkey CS689 11/2/00.
1 Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 Benchmark H. Güneş Kayacık Nur Zincir-Heywood Malcolm I. Heywood.
INTRUSION DETECTION INTRUSION DETECTION INTRUSION DETECTION INTRUSION DETECTION INTRUSION DETECTION INTRUSION DETECTION INTRUSION DETECTION INTRUSION DETECTION.
A Data Mining Approach for Building Cost-Sensitive and Light Intrusion Detection Models PI Meeting - July, 2000 North Carolina State University Columbia.
Data Mining Approaches for Intrusion Detection Wenke Lee and Salvatore J. Stolfo Computer Science Department Columbia University.
Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
AI Week 14 Machine Learning: Introduction to Data Mining Lee McCluskey, room 3/10
An Overview of Intrusion Detection Using Soft Computing Archana Sapkota Palden Lama CS591 Fall 2009.
Implementation of Machine Learning and Chaos Combination for Improving Attack Detection Accuracy on Intrusion Detection System (IDS) Bisyron Wahyudi Kalamullah.
1 Intrusion Detection Methods “Intrusion detection is the process of identifying and responding to malicious activity targeted at computing and networking.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Boundary Detection in Tokenizing Network Application Payload for Anomaly Detection Rachna Vargiya and Philip Chan Department of Computer Sciences Florida.
Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,
Intrusion Detection Systems Paper written detailing importance of audit data in detecting misuse + user behavior 1984-SRI int’l develop method of.
Intrusion Detection Wenke Lee Computer Science Department Columbia University.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Intrusion Detection System
Scientific Systems Not for Public Release SSCI #1301 DARPA OASIS PI MEETING – Santa Fe, NM - Jul 24-27, 2001 Intelligent Active Profiling for Detection.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Anomaly Detection. Network Intrusion Detection Techniques. Ştefan-Iulian Handra Dept. of Computer Science Polytechnic University of Timișoara June 2010.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
1. ABSTRACT Information access through Internet provides intruders various ways of attacking a computer system. Establishment of a safe and strong network.
Introduction to Machine Learning, its potential usage in network area,
Experience Report: System Log Analysis for Anomaly Detection
An Enhanced Support Vector Machine Model for Intrusion Detection
Modeling IDS using hybrid intelligent systems
Presentation transcript:

Where Are the Nuggets in System Audit Data? Wenke Lee College of Computing Georgia Institute of Technology

Outline Intrusion detection approaches and limitations An example data mining (DM) based intrusion detection system (IDS) Lessons learned and challenges ahead –or where are the nuggets?

Prevent Cyber Threats and Counter Measures Detect React/ Survive Layered mechanisms

Components of Intrusion Detection Audit Data Preprocessor Audit Records Activity Data Detection Models Detection Engine Alarms Decision Table Decision Engine Action/Report system activities are observable normal and intrusive activities have distinct evidence

Limitations of Current IDSs Misuse detection only: –“We have the largest knowledge/signature base” –Ineffective against new attacks Individual attack-based: –“Intrusion A detected; Intrusion B detected …” –No ability to recognize attack plan Statistical accuracy-based: –“x% detection rate and y% false alarm rate” Are the most damaging intrusions detected?

Next Generation IDSs Adaptive and cost-effective –Detect new intrusions –Dynamically configure IDS components for best protection/cost performance Scenario-based –Correlate (multiple sources of) audit data and attack information

Adaptive IDS – Model Coverage IDS ID Modeling Engine anomaly dataanomalydetection semiautomatic ID models (misuse detection)

Semiautomatic Model Generation Data mining based approach: –Build classifiers as ID models A prototype system: –MADAM ID –One of the best performing systems in the 1998 DARPA Evaluation

Background in Data Mining Data mining –Applying specific algorithms to extract valid, useful and understandable patterns from data Why applying data mining to intrusion detection? –Motivation Semi-automatically construct or customize ID models for a given environment –From the data-centric point view, intrusion detection is a data mining/analysis process –Successful applications in related domains, e.g., fraud detection, fault/alarm management

The Iterative DM Process of Building ID Models models raw audit datapackets/ events (ASCII) connection/ session records features patterns The MADAM ID Workflow

ID as a Classification Problem F1=v1F1=v1 F 1 =v 2 && … F 5 =v 5 && … higher entropy (impurity) lower entropy (purer) use features with high information gain – reduction in entropy

The Feature Construction Problem flagdst …service … h1 http S0 h2 http S0 h4 http S0 h2 ftp S0 syn flood normal existing features useless dst …service … h1 http S0 h2 http S0 h4 http S0 h2 ftp S0 flag %S construct features with high information gain How? Use temporal and statistical patterns, e.g., “a lot of S0 connections to same service/host within a short time window”

Mining Patterns Associations of features –e.g. (service=http, flag=S0) –Basic algorithm: association rules Sequential patterns in activity records –e.g. (service=http, flag=S0), (service=http, flag=S0)  (service=http, flag=S0) [0.8,2s] –Basic algorithm: frequent episodes

Feature Construction from Patterns patterns anomaly/ intrusion records mining compare intrusion patterns new features historical normal and attack records mining training data detection models learning

Feature Construction Example An example: “syn flood” patterns (dst_host is reference attribute): –(flag = S0, service = http), (flag = S0, service = http)  (flag = S0, service = http) [0.6, 2s] –add features: count the connections to the same dst_host in the past 2 seconds, and among these connections, the percentage with the same service, the percentage with S0

The Nuggets Feature extraction and construction –The key to producing effective ID models –Better pay-off than just applying another model learning algorithm –How to semi-automate the feature discovery process (by incorporating domain knowledge)?

Feature Construction: the MADAM ID Example Search through the feature space through iterations, at each iteration: –Use different heuristics to compute patterns (e.g., per-host service patterns) and construct features accordingly Limitations: –Connection level only –Within-connection contents are not “structured”, and much more challenging!

The Nuggets (continued) Efficiency –Training Huge amount of audit data –Sampling? Always retrain from scratch or incrementally? –Execution of output model in real-time Consider feature cost (time) Trade-off of cost vs. accuracy

Cost-sensitive Modeling: an Example A multiple-model approach: –Build multiple rule-sets, each with features of different cost levels; –Use cheaper rule-sets first, costlier ones later only for required accuracy. 3 cost levels for features: –Level 1: beginning of an event, cost 1; –Level 2: middle to end of an event, cost 10; –Level 3: multiple events in a time window, cost 100.

The Nuggets (continued) Anomaly detection –What is a general approach? –Taxonomy and specialized algorithm for each type? –Theoretical foundations?

Conclusions There is a need for DM in ID Research should be focused on the real nuggets: –Feature construction –Efficiency –Anomaly detection

Thank You!