Immune System Metaphors Applied to Intrusion Detection and Related Problems by Ian Nunn, SCS, Carleton University

Slides:



Advertisements
Similar presentations
V-Detector: A Negative Selection Algorithm Zhou Ji, advised by Prof. Dasgupta Computer Science Research Day The University of Memphis March 25, 2005.
Advertisements

CIFD: Computational Immunology for Fraud Detection
1 BY: Nazanin Asadi Zohre Molaei Isfahan University of Technology.
Immune System Chapter 14.
Western Michigan University Covert Timing Channels Omar Darwish Instructor: Professor Elise de Doncker.
EvoNet Flying Circus Introduction to Evolutionary Computation Brought to you by (insert your name) The EvoNet Training Committee The EvoNet Flying Circus.
Anomaly Detection in Data Docent Xiao-Zhi Gao
Click to edit master text Click to edit Master text styles Second level Third level Fourth level Fifth level Artificial Immune Systems Dr Uwe Aickelin.
Artificial Immune Systems Andrew Watkins. Why the Immune System? Recognition –Anomaly detection –Noise tolerance Robustness Feature extraction Diversity.
Models and Security Requirements for IDS. Overview The system and attack model Security requirements for IDS –Sensitivity –Detection Analysis methodology.
 Firewalls and Application Level Gateways (ALGs)  Usually configured to protect from at least two types of attack ▪ Control sites which local users.
Intrusion Detection Systems and Practices
1 Lecture 8: Genetic Algorithms Contents : Miming nature The steps of the algorithm –Coosing parents –Reproduction –Mutation Deeper in GA –Stochastic Universal.
EvoNet Flying Circus Introduction to Evolutionary Computation Brought to you by (insert your name) The EvoNet Training Committee The EvoNet Flying Circus.
5/1/2006Sireesha/IDS1 Intrusion Detection Systems (A preliminary study) Sireesha Dasaraju CS526 - Advanced Internet Systems UCCS.
This work is supported by the National Science Foundation under Grant Number DUE Any opinions, findings and conclusions or recommendations expressed.
Lesson 13-Intrusion Detection. Overview Define the types of Intrusion Detection Systems (IDS). Set up an IDS. Manage an IDS. Understand intrusion prevention.
Genetic Algorithms Learning Machines for knowledge discovery.
Intro to AI Genetic Algorithm Ruth Bergman Fall 2002.
Artificial Immune Systems Our body’s immune system is a perfect example of a learning system. It is able to distinguish between good cells and potentially.
Intro to AI Genetic Algorithm Ruth Bergman Fall 2004.
Lecture 11 Intrusion Detection (cont)
Building Survivable Systems based on Intrusion Detection and Damage Containment Paper by: T. Bowen Presented by: Tiyseer Al Homaiyd 1.
Subgoal: conduct an in-depth study of critical representation, operator and other choices used for evolutionary program repair at the source code level.
Khaled Rasheed Computer Science Dept. University of Georgia
Population-based metaheuristics Nature-inspired Initialize a population A new population of solutions is generated Integrate the new population into the.
By : Anas Assiri.  Introduction  fraud detection  Immune system  Artificial immune system (AIS)  AISFD  Clonal selection.
Software Dependability CIS 376 Bruce R. Maxim UM-Dearborn.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.
Lucent Technologies – Proprietary Use pursuant to company instruction Learning Sequential Models for Detecting Anomalous Protocol Usage (work in progress)
Distributed Network Intrusion Detection An Immunological Approach Steven Hofmeyr Stephanie Forrest Patrik D’haeseleer Dept. of Computer Science University.
Genetic Algorithms: A Tutorial
1 Principles of a Computer Immune System Anil Somayaji, Steven Hofmeyr, & Stephanie Forrest Presented by: Jesus Morales.
Genetic Algorithm.
Intrusion Prevention System. Module Objectives By the end of this module, participants will be able to: Use the FortiGate Intrusion Prevention System.
Evolutionary Algorithms BIOL/CMSC 361: Emergence Lecture 4/03/08.
` Research 2: Information Diversity through Information Flow Subgoal: Systematically and precisely measure program diversity by measuring the information.
Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.
SOFT COMPUTING (Optimization Techniques using GA) Dr. N.Uma Maheswari Professor/CSE PSNA CET.
Automatic Test-Data Generation: An Immunological Approach Kostas Liaskos Marc Roper {Konstantinos.Liaskos, TAIC PART 2007.
CCNA 1 Module 10 Routing Fundamentals and Subnets.
CS 484 – Artificial Intelligence1 Announcements Lab 4 due today, November 8 Homework 8 due Tuesday, November 13 ½ to 1 page description of final project.
DoWitcher: Effective Worm Detection and Containment in the Internet Core S. Ranjan et. al in INFOCOM 2007 Presented by: Sailesh Kumar.
What is Genetic Programming? Genetic programming is a model of programming which uses the ideas (and some of the terminology) of biological evolution to.
ECS 152A 4. Communications Techniques. Asynchronous and Synchronous Transmission Timing problems require a mechanism to synchronize the transmitter and.
Automatically Generating Models for Botnet Detection Presenter: 葉倚任 Authors: Peter Wurzinger, Leyla Bilge, Thorsten Holz, Jan Goebel, Christopher Kruegel,
An Introduction to Genetic Algorithms Lecture 2 November, 2010 Ivan Garibay
1 “Genetic Algorithms are good at taking large, potentially huge search spaces and navigating them, looking for optimal combinations of things, solutions.
Boundary Detection in Tokenizing Network Application Payload for Anomaly Detection Rachna Vargiya and Philip Chan Department of Computer Sciences Florida.
Edge Assembly Crossover
IMMUNOLOGICAL TOLERANCE Lecture 6 Jan Żeromski 2007/2008.
Changing the Rules of the Game Dr. Marco A. Janssen Department of Spatial Economics.
CITS7212: Computational Intelligence An Overview of Core CI Technologies Lyndon While.
1 Simulation of Immune System Answering Questions on the Natural Immune System Behavior by Simulations.
Ch 31 immune system AP lecture hill.com/sites/ /student_view0/ch apter22/animation__the_immune_response.h tml
Firewalls A brief introduction to firewalls. What does a Firewall do? Firewalls are essential tools in managing and controlling network traffic Firewalls.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 2005.ACM GECCO.8.Discriminating and visualizing anomalies.
An Introduction to Genetic Algorithms Lecture 2 November, 2010 Ivan Garibay
Approaches to Intrusion Detection statistical anomaly detection – threshold – profile based rule-based detection – anomaly – penetration identification.
Presentation By SANJOG BHATTA Student ID : July 1’ 2009.
Surface Defect Inspection: an Artificial Immune Approach Dr. Hong Zheng and Dr. Saeid Nahavandi School of Engineering and Technology.
` Question: How do immune systems achieve such remarkable scalability? Approach: Simulate lymphoid compartments, fixed circulatory networks, cytokine communication.
Genetic Algorithms. Solution Search in Problem Space.
Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.
Selected Topics in CI I Genetic Programming Dr. Widodo Budiharto 2014.
Immune System Chapter 14.
Dr. Unnikrishnan P.C. Professor, EEE
Intrusion Detection Systems
Immunocomputing and Artificial Immune Systems
Presentation transcript:

Immune System Metaphors Applied to Intrusion Detection and Related Problems by Ian Nunn, SCS, Carleton University

Overview of Presentation Review of immune system properties of most interest Algorithm design and the representation of application domains Examples of two recognition algorithms Overview of application areas Focus on intrusion detection systems (IDS) Advantages of IS models and future research The IS model as a swarm system

Immune System Characteristics of Interest The human immune system (IS) is a system of detectors (principally B and T cells) that: –After initial negative selection (tolerization), does not recognize elements of the body (self) –Is adaptable in that it can recognize over time, any foreign element (non-self) including those never before encountered –Remembers previous foreign element encounters –Dynamically regenerates its elements –Regulates the population size and diversity of its elements –Is robust to input signal noise (recognition region) and detector loss –Is distributed in nature with no central or hierarchical control –Is error tolerant in that self recognition does not halt the system –Is self-protecting since it is part of self

Representation of Self/Non-Self IS elements involved are cellular proteins and their peptide sequences Recognition is based on matching of structural regions called epitopes on antigens and paratopes on antibodies Shape space model: a parameterized representation (genotype) of the conformational form of self/non- self elements (phenotype)

IS Application Algorithm Design Requires a deep understanding of the problem domain Self/non-self discrimination the fundamental IS principle Steps in designing an IS algorithm: –Identification of features allowing correct and complete self/non- self discrimination* –Representation or encoding of features, particularly of continuous real-valued parameters*. Ab and Ag feature strings of same length facilitate algorithm performance analysis –Determination of a matching or fitness function. Important for evolution of Ab populations (affinity maturation) –Selection of IS principles to apply, e.g. negative selection, costimulation, affinity maturation, etc. * This is hard stuff and an important step in applying any modeling technique whether genetic algorithms or swarm simulations (recall for army ants the problem of deciding what parameters to assign to the ants and to the environment and what values to allow).

Approach to Feature Selection and Representation Antibodies and antigens represented by strings of features: –The set of actual values observed such as sensor readings, voltages, ASCII text is called the application’s phenotype –The coded representation is called the application’s genotype A feature is encoded by symbols from a finite alphabet Some application feature domains: –Binary variables: digital signals in computer systems –Discrete real variables: ASCII character text –Continuous real variables: real world sensors Continuous domains must be mapped onto discrete domains since we work with finite alphabets to ensure finite Ab/Ag population spaces

Phenotype Representation: Change Detection Problem Domains OS (UNIX) processes: sequences of top level system calls Program execution: alphabet symbols represent op codes File system: reduction to ASCII or binary strings User behavior and interface use: keystrokes, mouse clicks Time series data representation of a physical (plant) processes: x/y position of a milling machine tool Memory accesses: memory address calls Local network traffic: TCP/IP packets: addresses, ports Network traffic through routers and gateways: TCP/IP packets, addresses, volume

IS Phenotype Encoding and Matching Using a Binary Model 1 Genotype = Phenotype : 32 bit string on a binary alphabet Many matching (fitness) functions possible, e.g. for l i a contiguous substring of l 1’s in the complementary match (Hamming distance)

Example: Use of a Binary Model with a GA for Clonal Selection 1 Start with randomly generated Ag and possibly incomplete Ab populations For each Ab in turn, compute its average match (fitness) with a random fixed-size Ag subpopulation Use a standard GA with mutation but no crossover to evolve successively better generations of antibodies Niches observed to develop in coverage space for genetic commonalities (bacterial polysaccaride coating) if the initial populations have a bias Self recognition minimized (without negative selection) by selecting for more Ag specific instead of more general antibodies – less likely to match self

Establishing Antibody Fitness Random sub- population

GA Evolution of Antibodies 1 ________________

Use of a Negative Selection Algorithm for Clonal Selection 5,2 Want explicit self-filtering (tolerization) Algorithm: 1.Generate the set S of self (sub)strings 2.Generate a set R 0 of random strings 3.Match each string from R 0 against S : –Match (non-complementary) on at least r contiguous locations: reject –No match: add string to detector set R How to generate detectors efficiently an issue Match detector set against target strings to detect intruder Strings can be on any alphabet

Negative Selection Algorithm 2 Self string to be protected: , length of contiguous match substring r = 2 Match Ab1: 10xx Match Ab4: xx00

The Problem of Holes 6 For a particular choice of matching rule and Ab repertoire, some non-self strings may not be found causing a hole in the coverage space Let s 1 and s 2 be two antibodies matching over r-1 contiguous bits and h 1 and h 2 be two antigens A detector that matches any r contiguous bits in h 1 will also match either s 1 or s 2 for the same feature string. The same for h 2. So h 1 and h 2 are undetectable.

Major Application Categories of Immune System Theory Machine learning and pattern recognition: limited but promising work done to date Associative memories: limited work done to date Elimination of identified elements: –IS model: use the B cell and Tk cell “kill” disable viruses –Use a phagocyte analogue for cleanup and garbage collection –IBM virus lab and Forrest’s group at UNM have looked at this Recovery, repair and augmentation of identified elements : –IS model: use the B cell and Tk cell analogue to deliver a positive payload to an agent –Very little work done to date

Application Areas (cont.) Detection problems – where most of the work has occurred: –Fault: failure of a self element (industrial plant systems) –Change: any change in self (tumors) –Anomaly: unusual presentation of a self element –Virus: presence of a non-self element –Intrusion: attempt to gain access by non-self element –Many of the classical issues of computer and network security involve some element of detection or self/non- self discrimination

The Intrusion Detection Problem Two classical types of intrusion detection systems (IDS): –Host-based: domain is a single machine possibly on a network –Network-based: domain is a network of hosts Two classes of problem: –Anomaly detection: deviations from normal local resource use and network traffic –Misuse detection: usage identified with known system vulnerabilities and security policies

Essential Requirements of a Network-based IDS Robustness to host failures and noisy signals (anomalous behavior) Easy (self-)configurability of hosts Easy extendibility to new hosts Scalability: extendible to large networks without degradation of performance Adaptability: dynamically able to recognize new anomalies Efficiency: simple and low overhead operation Global analysis: able to correlate local events to form global patterns

Network Representation Commonly represent problem as the connection events (not message content) between computers Kim and Bentley 4 : –Phenotype: 35 real-valued fields in four categories - connection identifier, port vulnerabilities, TCP handshaking, traffic intensity –Genotype: 35 genes. A detector gene has three “nucleotides” (cluster number in (0, 9), min offset, max offset). An antibody or antigen has a single real value. –Cluster and offset tables established for each host at start –A matching function maps an Ag or Ab value to a cluster and takes the distance to the nearest cluster bound as the measure of similarity –Use positive detection events to evolve the offsets for clusters

Kim and Bentley Model 4 New lower interval bound for cluster 2 New upper interval bound for cluster 2

Network Representation (cont.) Hofmeyr and Forrest 3 : –Phenotype: 3 integer fields (source IP address, destination IP address, service or port number) –Genotype: for a detector, 49 bit binary string + state

Algorithmic Refinements 3 Detectors may have a lifetime at the end of which they are replaced if they have not matched – maintain diversity Activation threshold and time decay on activation level to deter limited autoimmune reaction to rare self strings Local activation causes a message to be sent to other hosts decreasing their activation levels (cytokine costimulation) Matching rule may result in holes in coverage. A randomly assigned permutation mask to control packet presentation helps avoid this (MHC molecule host diversity) contributing to population diversity. Each host has a unique detector set contributing to diversity and self-protection across a population of hosts

The Hofmeyr and Forrest Model 3 Host-based refinement fields Detector state Antigen pheotype

Problem Posed by Computer Applications The repertoire of human self proteins is fixed over a lifetime In networks, valid hosts are added and deleted without notice so what “self” is constantly changes Among a fixed set of hosts, valid usage patterns may change without notice One solution: costimulation by a trusted (human) authority both at start and subsequent operation Much work needs to be done

Advantages of IS Models Adaptability through the ability to recognize foreign patterns never before encountered Distributed detection contributes to: –Diversity (shape space coverage) –Robustness (failure of individual hosts) –Scalability and extendibility Quick response to new variants of old attacks Ability to reproduce detectors of increasing fitness while self-regulating the overall population

IS as a Swarm System The IS model has a number of characteristics in common with swarm systems: –Large populations of independent agents of characterizable classes –Each agent has at most a very few characteristic simple behaviors: Bind with another appropriate agent and activate (B and T cells) Kill something (killer T cell) Clone myself (B cell) Secrete a signaling chemical or an antibody (T and B cells) Live for a long time (memory B and T cells) –Simple interactions with the environment: Special things that happen in lymphoid organs Secreting signal chemicals which alter environmental properties (cytokines and inflammation) –Self-organizing as an emergent property –No centralized control over the system

Areas for Additional Research Matching rules with good computational properties, perhaps application specific ones Self/non-self representation and encoding Algorithms for generating detector sets Other selection algorithms Incorporation of additional IS characteristics Detector set populations: evolution, dynamics and emergent properties at the species level

References 1.Forrest, Smith, Javornik and Perelson. Using Genetic Algorithms to Explore Pattern Recognition in the Immune System. Evolutionary Computation, 1(3): , Forrest, Allen, Perelson and Cherukuri. Self-Nonself Discrimination in a Computer. Proceeding of the 1994 IEEE Symposium on Research in Security and Privacy, Los Alamitos CA, Hofmeyr and Forrest. Immunity by Design: An Artificial Immune System. In Proceedings of 1999 GECCO Conference, Kim and Bentley. Negative selection and niching by an artificial immune system for network intrusion detection. In Late Breaking Papers at the 1999 Genetic and Evolutionary Computation Conference, Orlando, Florida, 1999.

References (cont.) 5.Forrest, Allen, Perelson and Cherukuri. A Change-Detection Algorithm Inspired by the Immune System. Submitted to IEEE Transactions on Software Engineering, D'haeseleer, Forrest and Helman. An Immunological Approach to Change Detection: Algorithms, Analysis, and Implications. Proceeding of the 1994 IEEE Symposium on Research in Security and Privacy, 1996.