Presentation is loading. Please wait.

Presentation is loading. Please wait.

Immune System Metaphors Applied to Intrusion Detection and Related Problems by Ian Nunn, SCS, Carleton University

Similar presentations


Presentation on theme: "Immune System Metaphors Applied to Intrusion Detection and Related Problems by Ian Nunn, SCS, Carleton University"— Presentation transcript:

1 Immune System Metaphors Applied to Intrusion Detection and Related Problems by Ian Nunn, SCS, Carleton University inunn@digitaldoor.net

2 Overview of Presentation Review of immune system properties of most interest Algorithm design and the representation of application domains Examples of two recognition algorithms Overview of application areas Focus on intrusion detection systems (IDS) Advantages of IS models and future research The IS model as a swarm system

3 Immune System Characteristics of Interest The human immune system (IS) is a system of detectors (principally B and T cells) that: –After initial negative selection (tolerization), does not recognize elements of the body (self) –Is adaptable in that it can recognize over time, any foreign element (non-self) including those never before encountered –Remembers previous foreign element encounters –Dynamically regenerates its elements –Regulates the population size and diversity of its elements –Is robust to input signal noise (recognition region) and detector loss –Is distributed in nature with no central or hierarchical control –Is error tolerant in that self recognition does not halt the system –Is self-protecting since it is part of self

4 Representation of Self/Non-Self IS elements involved are cellular proteins and their peptide sequences Recognition is based on matching of structural regions called epitopes on antigens and paratopes on antibodies Shape space model: a parameterized representation (genotype) of the conformational form of self/non- self elements (phenotype)

5 IS Application Algorithm Design Requires a deep understanding of the problem domain Self/non-self discrimination the fundamental IS principle Steps in designing an IS algorithm: –Identification of features allowing correct and complete self/non- self discrimination* –Representation or encoding of features, particularly of continuous real-valued parameters*. Ab and Ag feature strings of same length facilitate algorithm performance analysis –Determination of a matching or fitness function. Important for evolution of Ab populations (affinity maturation) –Selection of IS principles to apply, e.g. negative selection, costimulation, affinity maturation, etc. * This is hard stuff and an important step in applying any modeling technique whether genetic algorithms or swarm simulations (recall for army ants the problem of deciding what parameters to assign to the ants and to the environment and what values to allow).

6 Approach to Feature Selection and Representation Antibodies and antigens represented by strings of features: –The set of actual values observed such as sensor readings, voltages, ASCII text is called the application’s phenotype –The coded representation is called the application’s genotype A feature is encoded by symbols from a finite alphabet Some application feature domains: –Binary variables: digital signals in computer systems –Discrete real variables: ASCII character text –Continuous real variables: real world sensors Continuous domains must be mapped onto discrete domains since we work with finite alphabets to ensure finite Ab/Ag population spaces

7 Phenotype Representation: Change Detection Problem Domains OS (UNIX) processes: sequences of top level system calls Program execution: alphabet symbols represent op codes File system: reduction to ASCII or binary strings User behavior and interface use: keystrokes, mouse clicks Time series data representation of a physical (plant) processes: x/y position of a milling machine tool Memory accesses: memory address calls Local network traffic: TCP/IP packets: addresses, ports Network traffic through routers and gateways: TCP/IP packets, addresses, volume

8 IS Phenotype Encoding and Matching Using a Binary Model 1 Genotype = Phenotype : 32 bit string on a binary alphabet Many matching (fitness) functions possible, e.g. for l i a contiguous substring of l 1’s in the complementary match (Hamming distance)

9 Example: Use of a Binary Model with a GA for Clonal Selection 1 Start with randomly generated Ag and possibly incomplete Ab populations For each Ab in turn, compute its average match (fitness) with a random fixed-size Ag subpopulation Use a standard GA with mutation but no crossover to evolve successively better generations of antibodies Niches observed to develop in coverage space for genetic commonalities (bacterial polysaccaride coating) if the initial populations have a bias Self recognition minimized (without negative selection) by selecting for more Ag specific instead of more general antibodies – less likely to match self

10 Establishing Antibody Fitness Random sub- population

11 GA Evolution of Antibodies 1 ________________ 1011010111110111

12 Use of a Negative Selection Algorithm for Clonal Selection 5,2 Want explicit self-filtering (tolerization) Algorithm: 1.Generate the set S of self (sub)strings 2.Generate a set R 0 of random strings 3.Match each string from R 0 against S : –Match (non-complementary) on at least r contiguous locations: reject –No match: add string to detector set R How to generate detectors efficiently an issue Match detector set against target strings to detect intruder Strings can be on any alphabet

13 Negative Selection Algorithm 2 Self string to be protected: 1011 0111 0011 0000, length of contiguous match substring r = 2 Match Ab1: 10xx Match Ab4: xx00

14 The Problem of Holes 6 For a particular choice of matching rule and Ab repertoire, some non-self strings may not be found causing a hole in the coverage space Let s 1 and s 2 be two antibodies matching over r-1 contiguous bits and h 1 and h 2 be two antigens A detector that matches any r contiguous bits in h 1 will also match either s 1 or s 2 for the same feature string. The same for h 2. So h 1 and h 2 are undetectable.

15 Major Application Categories of Immune System Theory Machine learning and pattern recognition: limited but promising work done to date Associative memories: limited work done to date Elimination of identified elements: –IS model: use the B cell and Tk cell “kill” disable viruses –Use a phagocyte analogue for cleanup and garbage collection –IBM virus lab and Forrest’s group at UNM have looked at this Recovery, repair and augmentation of identified elements : –IS model: use the B cell and Tk cell analogue to deliver a positive payload to an agent –Very little work done to date

16 Application Areas (cont.) Detection problems – where most of the work has occurred: –Fault: failure of a self element (industrial plant systems) –Change: any change in self (tumors) –Anomaly: unusual presentation of a self element –Virus: presence of a non-self element –Intrusion: attempt to gain access by non-self element –Many of the classical issues of computer and network security involve some element of detection or self/non- self discrimination

17 The Intrusion Detection Problem Two classical types of intrusion detection systems (IDS): –Host-based: domain is a single machine possibly on a network –Network-based: domain is a network of hosts Two classes of problem: –Anomaly detection: deviations from normal local resource use and network traffic –Misuse detection: usage identified with known system vulnerabilities and security policies

18 Essential Requirements of a Network-based IDS Robustness to host failures and noisy signals (anomalous behavior) Easy (self-)configurability of hosts Easy extendibility to new hosts Scalability: extendible to large networks without degradation of performance Adaptability: dynamically able to recognize new anomalies Efficiency: simple and low overhead operation Global analysis: able to correlate local events to form global patterns

19 Network Representation Commonly represent problem as the connection events (not message content) between computers Kim and Bentley 4 : –Phenotype: 35 real-valued fields in four categories - connection identifier, port vulnerabilities, TCP handshaking, traffic intensity –Genotype: 35 genes. A detector gene has three “nucleotides” (cluster number in (0, 9), min offset, max offset). An antibody or antigen has a single real value. –Cluster and offset tables established for each host at start –A matching function maps an Ag or Ab value to a cluster and takes the distance to the nearest cluster bound as the measure of similarity –Use positive detection events to evolve the offsets for clusters

20 Kim and Bentley Model 4 New lower interval bound for cluster 2 New upper interval bound for cluster 2

21 Network Representation (cont.) Hofmeyr and Forrest 3 : –Phenotype: 3 integer fields (source IP address, destination IP address, service or port number) –Genotype: for a detector, 49 bit binary string + state

22 Algorithmic Refinements 3 Detectors may have a lifetime at the end of which they are replaced if they have not matched – maintain diversity Activation threshold and time decay on activation level to deter limited autoimmune reaction to rare self strings Local activation causes a message to be sent to other hosts decreasing their activation levels (cytokine costimulation) Matching rule may result in holes in coverage. A randomly assigned permutation mask to control packet presentation helps avoid this (MHC molecule host diversity) contributing to population diversity. Each host has a unique detector set contributing to diversity and self-protection across a population of hosts

23 The Hofmeyr and Forrest Model 3 Host-based refinement fields Detector state Antigen pheotype

24 Problem Posed by Computer Applications The repertoire of human self proteins is fixed over a lifetime In networks, valid hosts are added and deleted without notice so what “self” is constantly changes Among a fixed set of hosts, valid usage patterns may change without notice One solution: costimulation by a trusted (human) authority both at start and subsequent operation Much work needs to be done

25 Advantages of IS Models Adaptability through the ability to recognize foreign patterns never before encountered Distributed detection contributes to: –Diversity (shape space coverage) –Robustness (failure of individual hosts) –Scalability and extendibility Quick response to new variants of old attacks Ability to reproduce detectors of increasing fitness while self-regulating the overall population

26 IS as a Swarm System The IS model has a number of characteristics in common with swarm systems: –Large populations of independent agents of characterizable classes –Each agent has at most a very few characteristic simple behaviors: Bind with another appropriate agent and activate (B and T cells) Kill something (killer T cell) Clone myself (B cell) Secrete a signaling chemical or an antibody (T and B cells) Live for a long time (memory B and T cells) –Simple interactions with the environment: Special things that happen in lymphoid organs Secreting signal chemicals which alter environmental properties (cytokines and inflammation) –Self-organizing as an emergent property –No centralized control over the system

27 Areas for Additional Research Matching rules with good computational properties, perhaps application specific ones Self/non-self representation and encoding Algorithms for generating detector sets Other selection algorithms Incorporation of additional IS characteristics Detector set populations: evolution, dynamics and emergent properties at the species level

28 References 1.Forrest, Smith, Javornik and Perelson. Using Genetic Algorithms to Explore Pattern Recognition in the Immune System. Evolutionary Computation, 1(3):191-211, 1993. 2.Forrest, Allen, Perelson and Cherukuri. Self-Nonself Discrimination in a Computer. Proceeding of the 1994 IEEE Symposium on Research in Security and Privacy, Los Alamitos CA, 1994. 3.Hofmeyr and Forrest. Immunity by Design: An Artificial Immune System. In Proceedings of 1999 GECCO Conference, 1999. 4.Kim and Bentley. Negative selection and niching by an artificial immune system for network intrusion detection. In Late Breaking Papers at the 1999 Genetic and Evolutionary Computation Conference, Orlando, Florida, 1999.

29 References (cont.) 5.Forrest, Allen, Perelson and Cherukuri. A Change-Detection Algorithm Inspired by the Immune System. Submitted to IEEE Transactions on Software Engineering, 1995. 6.D'haeseleer, Forrest and Helman. An Immunological Approach to Change Detection: Algorithms, Analysis, and Implications. Proceeding of the 1994 IEEE Symposium on Research in Security and Privacy, 1996.


Download ppt "Immune System Metaphors Applied to Intrusion Detection and Related Problems by Ian Nunn, SCS, Carleton University"

Similar presentations


Ads by Google