Sei sulla pagina 1di 75

Independent Study Report

Artificial Immune Systems


1. Introduction:
The biological Immune systems is a complex and adaptive system that defends body from the antigens or pathogens from attack. It is possible to differentiate between immune cells as selfcells and non-self cells. It is probable with the aid of the distributed and parallel force that has the intelligence to take appropriate action from local and both global view using its connections of chemical messengers for interactions. There are two majors branches of the immune systems: 1. The innate system is static system which indentify and destroys antigens while; 2. The Adaptive immune system reacts to unknown antigens patterns and develop a reaction to those encountered antigens that can remain within body for longer time.

Such noticeable information processing capability of bio-logical immune system has caught attention of computer engineers around the world for its application in computer security, anomaly detection, fault tolerance, pattern recognition, etc. This field has got its application in robotics and in some cases involves optimization tasks also.

2. Overview of Bio-logical Immune Systems:

The biological immune system has evolved over millions years and it is elaborate defense system. The immune system employs multilevel and overlapping defense in parallel and distributed way although the immune mechanism namely innate and adaptive and processes like humeral and cellular are not known completely.
1

The biological immune system respond to attack either to neutralize the antigenic effect or destroy the antigen. Such response is dependent on the way the antigen type and the way it enters.

The crucial features of the biological immune system are: a. Affinity (matching) b. Diversity c. Distributed operation (no central mechanism) Affinity or matching degree refers to the binding between antibody and antigen. Diversity means there should be different number of antibody types that can act as key to antigen locks. Distributed control means that there is no central mechanism to govern the immune response when antigen attacks. There are local interactions between immune cells and antigens.

There are two immune cells that play important role in immune response: 1. B-cells (Bone Marrow), 2. T-cells (Thymus). Both these types of immune cells belongs to bone marrow but T-cells migrate to thymus to get mature and in this way flow in the body through blood. There are three types of T-cells which are mentioned below: a. Helper T-cells These cells are important for the activation of B-cells. b. Killer T-cells

Such cells are attached to the alien invaders and inject the destroying chemical molecules in to antigens thereby causing their destruction. c. Suppressor T-cells These genre of T-cells suppress the autoimmune interactions between cells. Thereby they contribute to the network stabilization. On the other hand, the B-cells are responsible for the production of antibodies that binds to antigens and cause them to die out. Each B-cell generate only one type of antibody (which numbers in millions). In the figure below, I-II show the invade entering the body and activating T-Cells, which then in IV activate the B-cells, V is the antigen matching, VI the antibody production and VII the antigens destruction.

Figure (1) Immune system Cells [6]


3

From above description one can say that the innate immune system is responsible for the primary response and the adaptive immune system is responsible for secondary response.

Hence, the human body is protected against foreign invaders by a multilevel system. The biological immune system composed of skin, respiratory system, destructive enzymes and stomach acids. The immune system is divided into two heads:

1. Innate immunity (non-specific); 2. Adaptive immunity (specific ). Such systems affect each other and linked to each other. Again there are two types of adaptive immunity which are:

a. Humoral immunity, b. Cell mediated immunity.

1. Innate immunity: This immunity is congenital. pH temperature and chemicals rises unbeneficial living conditions for foreign organisms. Extracellular molecules are ingested by macrophages and such process of ingestion is affected by chemical messengers called lymphokines. The sialic acid on foreign molecules make C3b bind to these surfaces for longer time. Thus, MAC is developed that penetrates the cell surface and kill the cell of foreign antigen. 2. Adaptive Immunity: It is crucial for learning and memory.

a. Humoral Immunity This kind of immunity is happened by antibodies molecules contained by fluids within body termed as humors. It involves the interactions between B-cells and antigens. The subsequent proliferation and formation of memory cells. When there is an interaction between antibody and antigen, the antigen can be destroyed in many ways. For instance, antibody can cross-link the antigen forming the clusters that are more readily ingested by macrophages cells.

b. Cell Immunity As the name indicates that it is cell mediated. T-cells are responsible for cell-mediated immunity. Cytotoxic T cells participate in cell-mediated immunity reactions by killing altered self cells. Cytokines secreted by TDH can mediate this kind of cellular immnunity.

3. Artificial Immune Systems Basic Concepts

3.1 Initialization and Encoding:

In order to implement Artificial Immune System, there are four parameters which are needed to be considered: 1. Encoding 2. Similarity Measure 3. Selection 4. Mutation Once we encode, then a similarity measure is determined in order to calculate degree matching which perform selection and mutation until we reach the stopping criteria.

Selection of encoding scheme is very important for algorithms success. Similar to Genetic Algorithm, there is close relationship between encoding and fitness function of genetic algorithms. Fitness function is nothing but matching or affinity in artificial immune systems.

Now we have to consider two terms namely antigen and antibody. An antigen is target or solution for a given problem. For example, the data to be checked or intrusion in system. An antibody is the remaining data, e.g., other users in the data set or the network traffic.

Antigens and antibodies are encoded in the similar way. The most common way is string representation, where length is number of variables, the position is variable identifier and the corresponding value of variable. For data mining and intrusion detection, a five variable binary problem can be shown as: (10010) Example: Data Mining: The problem of recommending movies. The encoding deals with representation of users profile with respective to movies seen and the like and respective dislikes. A list of numbers representing the vote can turn out to be encoding. The votes can be binary or it can be 10 integers in a range. [0,5] where 0 indicates not like movie and from 1 to 5 shows the rating of how much the movie is appreciated. A possible encoding scheme for movie recommendation:

**

+*

++ (1)

id = identifier score = score to the user.


6

Intrusion Detection: The encoding looks like: [<protocol> <source ip> <source port> <destination ip> <destination port>], example: [<tcp> <113.112.255.254> <108.200.111.12> <25> which represents an incoming data packet send to port 25. In these scenarios, wildcards like any port are also often used [2,4].

3.2 Similarity or Affinity Measure Matching degree is one of the most important in developing Artificial Immune Systems algorithm. Two of the matching algorithm are described below with binary representation:

Now consider two strings below: (0 0 0 0 0) and (0 0 0 1 1)

It is noticed that by bit-by-bit comparison, there are two different bits at the last. We can say that the score is 3 depending on the matching between the two strings. This kind of matching whichever we did is opposite to Hamming Distance technique in which the different bits are needed to changed in order to bring similarity.

Again consider the strings (00000) and (01010). Once again the score is 3. The way in which the matching results is different still the score is 3. So, this could be a problem. In order to avoid such anamoly, we identify the continuous number of bits that match and get the length of the longest matching as the similarity measure. So, for the first example, the score is 3 and for the example second, the score is 1. If we do not want to use the binary representation, real-valued representation is available. We can determine the Euclidean distance between two strings.

For data mining, the matching degree is refered to as correlation. If we take the instance of movie recommendation, assume that we are finding the users from the data that are same to the main users profile. In that situation, whatever we are trying to do is to determine the similarity. For this we can use, the Pearson Correlation Coefficient between the two users. Let there are two users u and v:

)( ) (

) )

(2)

n represents the votes for which u and v have voted. ui is the vote of user for movie i and represents the average of user u over entire movies. The measure is amended so default to a value of 0 if the two users have no films in common.

The output ranges from -1 to 1 indicating the strong agreement to strong disagreement. 0 means no correlation. For data mining, the 1 and -1 are the most important. In negative selection algorithm, the element that are matched are eliminated and this shows that the B-cell maturation involves no matching between self molecules or cells. Now the question arises, where the Negative Selection is applied for artificial immune systems implementation.

Consider the Intrusion Detection, One way of solution to such intrusion detection problems is define self set S. Then the set of detectors are randomly initialized. The set of detectors are subjected to matching algorithm that

compares set self. Any matching detector is rejected and we remain with the elements that do not match with self. All these non-similar elements are comprising resultant detector set. Such detector set is used to continually monitor the network. If there is a match, this is sign of danger or alert. The branch of Computational Intelligence emerged in 1990s Artificial Immune Systems is used in computer security, pattern recognition, etc. [2,4,6].

4. Biological Immune System Models

4.1 Negative Selection Principle Its been clear that the thymus is responsible for maturation of T-cells and is shielded by the blood barrier which is able to exclude non-self antigens from thymus. Hence, the majority of the biological cells present in thymic environment are self and not non-self. As an inference, the Tcells containing repertoire that recognize the self cells are excluded from the thymus through the biological process termed as Negative Selection. All the matured cells that leave the thymic environment are self-tolerant and they do not identify the self cells.

From information processing view, negative selection perform pattern recognition by collection important or crucial information about the non-self of the patterns to be identified. So, by taking inspiration from biology, negative selection algorithm has been put forward for anomaly detection or fault tolerance.

Define the set that has to be protected and let it be self set (P). Generate the set of detectors (M) that detects all the elements not belonging to set P. The negative selection algorithm goes as follows: 1. Produce the random elements (C);

2. Compare P and C. If the element of set C matches with an element of set P then discard such element or else store it in set M.

Now the set M is created, the next step is to monitor the system for detection of non-self patterns. Consider set P to be monitored. The set P consists of elements of P and some new patterns or it can be totally new set. For all the items in set M, that corresponds to non-self patterns, detect it whether identifies an element of P and if it does then a non-self pattern is recognized and an action is taken. [12]

Figure (2) Negative Selection Principle [12]

4.2 Clonal Selection It is the theory that is used to describe how an immune response is executed when a non-self pattern is identified by a B-cell complimentary to negative selection. Figure shows clonal selection, proliferation and affinity maturation. The process can be explained as when a B-cell recognizes an antigen with certain degree of affinity, it is selected to generate high volume of antibodies which binds to antigens and results into their elimination with the aid of other immune
10

cells. The proliferation process is asexual which is a mitotic process in which cells divide themselves. The B-cells clones undergo a hyper mutation resulting B-cells with high affinity towards antigens. The B-cells also become memory cells.

From the computation point of view, 1. An antigen selects immune cells to proliferate. This rate of proliferation is directly proportional to affinity. The higher the affinity, the higher the proliferation.

2. The mutation rate is inversely proportional to the affinity.

Figure (3) Clonal selection [12]

11

Genetic algorithms are similar to clonal selection if cross-over operator is not there. However, the genetic algorithm has no affinity proportional reproduction and mutation properties. So, CLONALG l algorithm has been proposed to include these properties. Such algorithm was proposed for pattern recognition and thereafter it was modified for optimization tasks.

Suppose the set of patterns given to be P that are to be recognized, then the CLONALG algorithm steps are termed as below:

1. Generate a population of patterns (M) randomly. 2. Now, to the population (M), present each pattern of P to it. Determine affinity with each and every element of set M. 3. Identify the individuals of M that have best affinity. Produce copies of such elements in proportion to the affinity with the antigen. The more the affinity, the more the number of copies. 4. Mutate all the copies of the element in proportion to the affinity to the input pattern. The more the affinity, the lesser the mutation rate. 5. These mutated elements are then added to set M and determine the elements that are matured. These are memories of the system. 6. Iterate steps 2 to 5, until the certain criteria is met. Such criteria are minimum pattern recognition or classification error.

This very algorithm enables the Artificial Immune Systems to become good at pattern recognition. Hence, the CLONALG learns to recognize patterns depending on evolutionary like behavior. [12]

12

4.3 Immune Network The immune network theory states that the dynamic behavior is still there in immune system even when the antigen is not present. So, how does it happen? It is proposed the cells and molecules are able to identify each other. However, such theory is criticized by many immunologist but the computational features of immune network are very important in robotics.

In accordance to this theory, the molecules that are on the surface of antibodies which are recognized by other antibodies are called idiotopes.

In order to explain this theory, assume that there is antibody Ab1 recognises antigen Ag. Now imagine that this antibody Ab1 recognises the idiotope of antibody Ab2. So, Ab1 recognises Ab2 and Ag. We say that the Ab2 is internal image of Ag. Such recognition of idiotopes between molecules gives rise to connected cells network. A network is network of affinities. As a result of such interactions, a antibody-antibody recognition gives network suppression and antibodyantigen recognition gives rise to network activation and cell proliferation.

The recognition of one antibody by another one results in network suppression. Such ideology is modeled by eliminating all but one of the self-recognising cells.

Figure (4) Immune Network [12]


13

Set (P) contains patterns to be recognized.

1. Generate network population randomly. 2. For every element in set P, allow CLONALG that gives M* (memory cells) and their coordinates for the current antigen. 3. Calculate the affinity between elements of M*. 4. Accept all but those elements from M* that are having threshold more than prescribed. The intent is to eliminate redundancy in the network by suppressing self-recognising elements. 5. Combine the remaining elements of step 4 with the remaining elements found for each antigen element presented. This gives Set M. 6. Calculate the matching degree between each and every element of Set M and suppress all but self-recognizing. 7. Iterate step 2 to 6, until desired result is attained. [12]

5. Modeling the Bio-logical Immune Systems

5.1 Shape- Space Model: The interactions between the antibody and antigen is of importance in immune systems. The concept of Shape-Space is introduced to describe the interactions between immune cell molecules and antigens quantitatively by Perelson and Oster in 1979.

According to this concept, the antigens can be recognized within a known region known as recognition region around a antibody. The degree of binding between a antibody and attacking antigen usually involves the short range non-covalent interactions based on electrostatic charge, hydrogen-binding, van-der Waals force of attractions/repulsions, etc. The molecules should

14

interact with each other over sufficient portion of their respective surfaces. Hence, there is extensive region of complementarity.

The existence of chemical groups as well as the shape and charge distributions are characteristic properties of antigens and antibodies which are crucial in identifying the interactions between these molecules. This set of features was called the generalized shape of a molecule [1].

Imagine that the generalized shape of antibody combining site can be described by L parameters: length, height, width of any bump or groove in the combining site, its charge, etc. The confirm numbers of parameters or their values is not desirable. Then a specific point in L-dimensional space called shape-space shows the generalized shape of an attacking molecule of an antigen binding region with relation to its antigen binding properties.

If an organism has a repertoire of N size, the shape space would contain N points. These points would lie in finite volume V of the space because there is only a limited lengths, widths, charges, etc. that an antibody combining site can assume. Antigenic determinants (epitopes) are characterized by generalized shapes whose complements lie within V as the Ag-Ab interactions are measured via regions of complementarity.

It is not necessary that antigen and antibody should match exactly. They may match with lower affinity. The paratopes interacts with almost all the epitopes with Volume V with radius e. Each antibody can recognize all types of epitope within recognition region of volume V, we assume that an antigen can present different types of epitopes and hence a finite number of antibodies can recognize almost infinite numbers of points

15

Figure (5) Shape-Space Model [6]

into volume V. This is related to cross-reactivity phenomenon in bio-logical immune systems. So, in shape-space model like patterns occupy adajacent regions of the shape space and might be recognized by the same antibody shape as far as e is provided [6].

5.2 Ag - Ab Representations and Affinities: The Ag-Ab representation determine the distance measure that can be used to calculate the degree of interaction between these molecules. Mathematically, there are three ways to represent antibody-antigen pairs and to determine their matching strength: 1. Euclidean shape-space 2. Manhattan shape-space 3. Hamming Shape-space [4]
16

The generalized shape of a molecule (m), either antibody or antigen can be represented by a set of real valued coordinates m = <m1,m2,..mL>. m belongs to L dimensional real valued shapespace.

The affinity between antibody and antigen is measured by the distance they have between two strings or vectors, for example in Euclidean or the Manhattan distance. In the case of Euclidean distance, if the coordinates of an antibody are given by <ab1, ab2, ab3abL> and the coordinates are given by <ag1,ag2.agL>, then the distance (D) between them is:

(3)

(4)

Eqn (3) is depicts Euclidean distance case and Eqn (4) depicts Manhattan distance case.

Shape-spaces that use real valued coordinates and that measure distance in the form of eq (1) are called Euclidean distance shape-spaces and those iin the form of eq (2) are called Manhattan shape-spaces.bols

Another shape space is Hamming shape space in which the antigen and antibody are termed as symbols sequences over an alphabet of size k. Such sequences can be interpreted as peptides and the different symbols as characteristic properties of amino acids. In context of artificial immune systems the mapping between shape and sequence are equivalent.

17

(5)

Equation (5) depicts hamming distance measure.

From equation (3) to (5) we see how to determine the affinities between molecules in Euclidean, manhattan and hamming shape-spaces, respectively. In order to study the cross-reactivity, it is important to coin the relation between distance D, recognition region and matching threshold.

When the distance between two sequences is maximum, the molecules have exact complement and their affinity is also maximum. In other cases, suppose the matching affinity is not maximum, it is good to take into consideration real valued spaces differently than hamming spaces in measuring ag-ab interactions.

In Euclidean and Manhattan, a limit on the magnitude of each shape-space parameter cab be employed. Moreover, the distance can be normalized, for example, over the interval [0, 1], so that the matching strength also lies in the same range.

If we assume binary representation of ag-ab interactions then graphical ieraction is clear in hamming shape-space. In the universe of bitstring representation the molecular binding takes place only when the bitstrings are complementary to each other. For example, ab = <0 0 1 0 0 1 0> ag = <1 1 0 1 1 0 1>

18

Figure (6) Antigen- Antibody perfect matching using bit-string representation [6]

The affinity between antibody and antigen is the number of bits that are complementary in the representation string. The way to measure the affinity is by XOR operator. The desired matching strength between two randomly taken bitstrings equals to half of thir length(if they are the same length).

A binding value shows whether the molecules are bound or not. In other words, it means if the antigen is recognized or not by antibody. We can use several activation functions that can give us idea regarding the binding value in proportion to the distances between the ab and ag molecules.

A bond is established only when the value of the match score is greater than (L e) in case of the threshold function.

In continuous case the sigmoid function is good to apply where the e relies in the inflexion point pf the curve.

In the hamming shape-space, the set of all possible antigens is considered as a spaces points, where antigenic molecules with similar shapes occupy the adajacent points in the space. The
19

total number of unique antibodies and antigens is bitstring length.

, where k = size of alphabet and L = the

A given antibody covers some portion of the shape-space depending on the recognition of some sets of antigens. The matching threshold e determines the coverage provided by a single antibody and in case when e = 0, then a perfect match is necessary. It means that an antibody and antigen must be exacy complement of each other. The number of antigens covered within a region of radiuse is given by:

( )

(6.1)

C = coverage of the antibody, L = length of the bitstring, e = matching threshold.

On the basis of eqn (6), a given bitstring of length L and an matching threshold e, the minimum number of antibody molecules (N) necessary to complete the shape-space coverage can be defined as

( )

(6.2)

ceil is the operator that rounds the value in parenthesis towards its upper nearest integer [2,4,6].

20

6. The AIS Model


The artificial immune system model proposed by J.D. Farrner and N.H. Packard is simple enough to simulate on computer but that still contains enough realism to embody characteristic properties of the network. In this model they have left out many crucial features such as T-cells and macrophages which contain the essence of the idiotypic netwok.

The sequence of amino acids specifying the chemical properties of the epitope and paratope are represented as binary strings. So, in this case, the antibodies are viewed as to be composed of two amino acids , 0 and 1. The sequence of five binary numbers can be corresponded to amino acid. In this way twenty amino acids can be represented. The simplification that is considered here is that each antigen and antibody has only one epitope but in reality one can see antigen or antibody has many different epitopes[5].

Thus, an antibody is represented as (p,e), where p represents paratope and e represents the epitope string. The allowed reactions between different antibodies and between antibodies and antigens are found by searching the complementary matches between strings.

The exact string matching is not required. The strings are allowed to match in any possible match in order to model the two molecules in more than one way. Let epitope string and defined as s min( , represents the length of ( ) denote

represents the length of paratope string. So, the matching threshold is ), below which the two antibodies will not react at all. Let

the value of the n-th bit of i-th epitope string, string [1,2].

( ) shows the n-th bit value of the j-th paratope

21

Now, the matching specificities

is given by:

( )

) ... (7)

In above equation (7),

represents the exclusive-or operation for complementary matching.

6.1 Procedure Used for Computing Partial Matches:

Figure (7) Epitope and Paratope string matching [5]

In this example, that


( ) is

= 8 and s = 6. Alignments with -2 k 2 are possible. Here k = -1 so ( ). For the above example, G( ) = 1; for k = -1 and G( ) = 0 for = 1.

comparable to

all other values of k, hence

So, G( ) = x for x > 0 and G( ) = 0 otherwise. The sum over n ranges over all possible positions on the epitope and paratope; the sum over k allows the epitope to be shifted with respect to the paratope . G determines the strength of a possible reaction between the epiopte and the paratope. For goven alignment, i.e, value of k, G is 0 if less than s bits are complimentary and G = 1 +
22

when s or more bits are complimentary. If matches occur at more than one alignment, we sum their strength to consider that the molecules might be able to interact in more than one way, and thus react more strongly because they spend more time together than molecules that can interact in only one alignment [5].

In this model, free antibodies with antibodies attached to cells are lumped together and only of the total number of antibodies of a given type i in terms of the concentration variable xi are kept track of.

What happens when two different antibodies interact? In this interaction Farmer and Packard assume the paratope on one antibody recognizes the epitopes on the other antibody. They agin aasume that the result of such interaction is that the antibody with the paratope reproduces some fixed numbers of times, while some fixed probability , the antibody with the epitope is eliminated. The degree to which one antibody reproduces and the other dies is controlled by the degree of complementarity between the paratope and the epitope. So, the model is symmetric with regard to antibody interaction.

Suppose N be the number of antibodies with concentrations { , with concentrations { , ,

} and n antigens

.. }. It is possible to avoid simulating the microscopic

dynamics in differential equations for the concentrations. This is only possible only when the system is well mixed and sufficiently large such that the number of interactions needed to produce a significant change in the concentration of any particular type of antibody is huge.

23

On the basis of assumptions:

(8)

In above equation (8), the first term represents the stimulation of the paratope of an i-th anitibody by the epitope of j-th antibody. The second term represents the suppression of i-th antibody by jth antibody. The probabaility of collision of antibody of type i with antibody of typr j is shown by term and parameter c indicates the number of collisions per unit time and rate of

amtibody production simulated by collision.

The match specificities term indicates what reactions occur and how strongly. probable inequality between stimulation and suppression. When =

represents , there are

symmetrical interactions between paratopes and epitopes and the model is similar to one proposed by Hoffman.

In order to model entire immune response, the concentrations of antigens should also be introduced that may change depending upon the number of antigens increase or decrease. The last term shows the death rate. The best way to change of the system at a fixed value[5]. in such a way the total concentration

The list of antibody and antigen types is dynamic. The changing occurs due to new types are added or removed. The value N and n changes with time but on time scale it is slow as compared to changes in . In eqn. (8), we do integration over a period of time. The composition of system

is examined and updated as it is needed. To update we put minimum threshold an all concentrations so that a variable and all of its reactions is eliminated when the concentration goes below threshold.
24

The generation of new antibody types is done through genetic operators that is applied to paratope and epitope strings such as Crossover, inversion and point mutation. In crossover, two antibody types are randomly selected and randomly positions within the two strings are chosen and then the pieces on one side of the chosen position are interchanged in order to produce two new types. Epitopes and paratopes are crossed over separately. By randomly changing one of the bits in a given string point mutation is implemented and the implementation of inversion is performed by inverting a randomly chosen segment of the string.

Antigens can be generated by a variety of mechanisms either randomly or by design. The same antigen type can be given to the system so that we can see whether it can eliminate it or not. Once the system learns to eliminate it, the number of antigens can be presented to see whether system forget to eliminate or remember to eliminate the antigen. The number of antigen provided to the system can be varied [5].

The antibodies whose paratopes match epitopes are amplified at the expense of other antibodies. If = 1 (equal suppression and stimulation) and > 0 then every antibody type eventually

dies due to the damping term. Letting

< 1 favors the formation of loops of reaction, since all

the numbers of reaction loop gain concentration and can neutralize the damping term. When N increases, the number of loops and respective lengths also increases.

Even when the system is disturbed by introduction of new types, it can remember certain states due to robust properties of the reaction loops. The antibodies that can recognize the internal and external other molecules are retained in the system and their concentration is increased. Antibodies that do not recognize the other molecules are eliminated. Hence, together with immunological memory, the system posses the immunological forgetting [5].

25

In the bio-logical immune system, antigens are sometimes restored in the system for long time which is comparable to lifespan of organism. The exact reason for this is not now known. One theory states that the antigen remain in degraded form in lymph nodes and their periodic exposure to immune system retain memory. But as antigens are potentially dangerous, this theory is highly risky. Another theory is that the B-cells that have reacted to antigens undergo the dormant state and surface up when similar or kind of antigen occurs again. Such dormant state can last for periods of weeks or may be months [1]. Another hypothesis is proposed by Farmer and Packard by means of idiotypic network.

6.2 Hypothesis: Let the concentration of antibodies that recognize the antigen be ab1. Now the concentration of antibodies that recognize the epitopes of ab1 antibodies be ab2. Continuing this way, let abn be the concentation of antibody that recognize the paratope of ab (n-1) antibodies. If abn is like original antigen, then it is like a loop because ab1 is going to recognize abn [3].

Figure (8) The formation of a cycle allows the antigen with epitope e0 to be remembered.[5]

26

Arrows denote recognition through string matching algorithmn. Paratope p(i) recognizes epitope e (i-1) for i= 1,2 n. To form a cycle, we assume that by chance p(i) recognizes en in addition to e0. Thus, en must resemble the antigen e0. If the antigen is eliminated, the existence of the cycle can maintain the concentration of ab1, an antibody that specifically recognizes the antigen [5]. If the paratopes are assumed to functions as epiotpes, then for sure the values of n resemble the antigen [5].

7. String Matching Rules


A matching rule defines matching or recognition, and the distance measure that the former is based on are the cornerstones in any detection, classification, or recognition algorithms. If you are dealing with categorical data, then a string representation may be more suitable and a matching rule like rcb is useful [7]. Several string-matching rules are described below:

7.1 Hamming Distance: It is defined as the number of different characters between two strings. The hamming distance between x and y strings is expressed as:

(9)

N = length of the string,

and

represents the i-th bit of the respective strings, the operation

within bracket shows the x-or operation [7].

27

7.2 Binary Distance:

(10)

Based on the number of bits that match or differ, the extensions of hamming distance have proposed.

(11)

(12)

(13)

a counts the number of 1s that match at the same position of both the strings; d enumerates the number of 0s that match at the same position of both the strings; b counts the number of 1s in string x that do not match string y; and c counts the number of 0s in string x that do not match string y [7].

28

Different similarity measures are developed which are as follows:

1. Russel and Rao (13)

2. Jacard and Needham

(14)

3. Kulzinski

( 5)

4. Sokal and Michener

( 6)

5. Rogers and Tanimoto

( 7)

29

6. Yule

( 8)

7.3 Edit Distance: It is defined as the minimum number of string transformations between two strings s1 and s2 required to change string s1 into s2 where the possible string transformations include (i) changing a character, (ii) inserting a character and (iii) deleting a character. It is also termed as Levenshtein distance, it is a generalization of the hamming distance [7].

Value Difference Metric:

( Where

( )

(19)

( (

))

And ( ) ( )

) denotes the probability that xi equals to the character c in the alphabet C [7].

30

7.4 Landscape Affinity Matching: This type of matching is used to capture the notion of matching biochemical and physical structures and approximate matching to immune system. Input string and antibody string are converted to bytes and then into positive integers to create landscape. Using sliding window, two strings are compared [7]. Three different similarity measures are defined as:

Difference Matching Rule:

(20)

Slope-Matching Rule:

|(

)| (21)

Physical matching:

|(

)|(22)

7.5 R-Contiguous Bits Matching: The rcb matching rule is defined as follows: If x and y are equal length strings, then they are said to be matched if x and y match at atleast r contiguous locations and we say match(x,y) is true.

31

Example: If x=ABADCBAB and y=CAGDCBBA, then we can say that match (x,y) is true for r<=3. In binary strings, matching rule used is rcb where a detector d is specified by a binary string c and the threshold value r, and d matches a string x if rcb of c matches the corresponding bits at the same positions of x [7].

7.6 R-chunk Matching Rule: The generalization of rcb matching rule is r-chunk matching rule. In rcb matching rule, detector d is specified by a binary string c and parameter r. An r-chunk detector d is said to match a string x if all the bits of c are equal to the r bits of x in the window defined by c. In r-chunk, detector of any size can be used and hence the detector covers more self-space. The matching window is specified for detectors. A cluster of r-chunk detectors that cover all possible windows has the same effect as an rcb detector. Example: x= e1 e2em (an element in shape space) d=(p; d1 d2..d,) with r<=m, p<=m-r+1 . According to r-chunk, x and d match if ei = di for i=p..p+r-1. In other words, element x matches detector d if at position p there is sequence of length r where all characters are equal [7].

8. Learning Using Artificial Immune Systems


J. E. Hunt and D. E. Cooke have developed an artificial immune system which composed of a bone marrow objects, a network of B cell objects and antigen population. Such an AIS has been implemented in CLOS (common lisp object system) and so b-cells, antigens, etc. are classes.

32

8.1 The Bone-Marrow Object It decides where in network the antigen has to be inserted, which B-cell is dying and causing increase in concentration of cells beneficial to the network. The bone marrow object possesses main algorithm which starts immune response by inserting antigen in b-cell network. The algorithm is as below:

Randomly initialize B-cell population Load antigen population Till end is reach DO Select antigen randomly from antigen population And insert such selected antigen in random point in B-cell network. Select the approximate percentage of B-cells around insertion point. For every B-cell selected Do interaction between antigen and each B-cell selected for immune response. Arrange these B-cells by the level of their avidity Delete 5% bad cells out of B-cell population Create n new B-cells (n = 25% of B-cell population) Out of this n, select m cells to join the immune network (m = 5% of population) [9]

B-cell Object The B-cell object possesses a pattern matching element. The B-cell object records the affinity level of the B-cell and looks after the links to any other B-cell object it is in connection within network of B-cells. Antibodies When an antigen meets antibody, an immune response is elicited and a match score is recorded. If this score is more than or equal to threshold, the binding between antibody and antigen occurs.

33

Antigens Each antigen which is potential is represented by antigen object possessing one epiotpe. The antigens are defined in external ASCII files and are inserted into AIS by the antigen population object. The object realizes the a series of lists from files and instantiates those series of list as objects of antigens.

B-cell Stimulation

)] -

Above equation represents the stimulation of B-cell.

8.2 Applying AIS to Pattern Recognition Problem 1. B-cell Objects The antibodys paratope is created from mRNA list. The bit string is copied by AIS in complementary manner. 2. Antibodies Bit String representation is used for pattern recognition problem. So, the antibody representation is of 0s and 1s.

3. Antigens AIS is tested by two diverse antigens population possessing the antigens binary list of 20 elements. The antigen population used to immunize the AIS is of three pattern type forming 33% of the population of antigen. The population consists of originals as well as the modified bit strings introducing noise into the data.
34

Antigen Population Representation:

11111111110000000000 00000000001111111111 00000111111111100000

33% 33% 33%

4. Antigen/Antibody In order to determine the match between Ag-Ab, instead of following match to start at any point on the antigen, a circular approach is followed. Hence, if the pattern described by the antibody starts halfway along the antigen, then the antibody is shifted half way along its length and hence a entire match is noted. Bit Shifted Antibody: Antibody Antigen Bit Shifted Antibody 0010101110 1000111010 0111000101

8.3 The match algorithm: Repeat For each region consisting of 2 or more 1s note their length if then = Shift Ab right 1 bit

Until Ab shift complete

35

Calculating Match Value:

Antigen:

011000011110110

Antibody: XOR:

100111000101101 1 1 1 1 1 1 0 1 1 0 1 1 0 1 1 12

Length:

2 88

MatchValue:

12 +

Hypermutation: In milti-point mutation, each bit selected was flipped and in sub-string regeneration, all the elements between the two desired points are flipped.

8.4 Running the System

99 binary antigens were used to immunize the system. The test population was then presented to AIS. The learning part was turned off while testing phase and hence the system is capable of showing the secondary immune response. In other words, the system can determine whether the antibody determine the antigen or not.

50 Iterations were performed for the immunization process in which the antibody population increased from 10 to 28. Then comes the turn for secondary response by presenting antigens as shown below.

36

1111111110000000000 0000111000110010001 1110010010010010010 0000000001111111111 1010101000101001110 1111001010100110100 0000011111111110000

TEST 1 * TEST 2 TEST 3 TEST 4* TEST 5 TEST 6 TEST 7*

TEST 1,4 and 7 are original antigens used in primary response. TEST 2,3 are modified versions of TEST1. On the same lines, TEST 5,6 are noised version of TEST 4. AIS should be able to identify TEST 2,3,5,6 without any difficulty [9].

9. Dynamic Behavior Arbitration using AIS


Akio Ishiguro et. al proposed a inference making system inspired from immune system in living organism and applied it to behavior arbitration of autonomous mobile robot as conventional AI systems have brittleness under dynamic changing environment. They try to evolve affinities among antibodies using genetic operators.

Much attention has been focused on the behavioral decomposition approaches as there are limitations on the functional decomposition for conventional AI. The arbitration among competence modules arises difficulties in behavior-based arbitration.

37

To overcome such difficulties, Maes proposed behavior network system under which an action suitable for the current situation and the given goals emerges on account of interaction between different competence modules. Akio Ishiguro et. al approached this problem from immunological point of view as shown in fig. 6.

Figure (9) Architecture of Algorithm [9]

As shown in figure, current situation, like, distance, direction to the detected obstacle perform action like antigen and competence modules and interactions between modules perform action as antibody and stimulation/suppression between antibodies, respectively. The baseline for such approach is that the best possible antibody is selected for antigen.

38

Figure (10) Immune Networks [8, 9]

In order to verify the ability of their proposed, they simulated it. There are three kinds of objects in this simulated environment: a] predators, b] obstacles and c] foods. For quantitative evaluation, following assumptions are made: 1. For movement, the immunobot consumes energy say Em. 2. If the immunobot is captured by predators, Ep amount of energy is consumed. 3. If immunobot collides, Eo energy is vanished. 4. If the immunobot get the food, it gets Ef energy. 5. For avoiding over-charging, the obtain-food behavior is not emerged after sufficient of food is already obtained.

The predators attack immuno-bot only if they are in predefined limit or range. So, to survive, the best possible antibody is desired. The figure below shows the structure of immunobot used in the simulations. It is armed with external and internal detectors. External detectors are sensors in eight directions detecting
39

predators, obstacle and food. The distance is also detected by each detector in terms like near, mid and far. The internal detector detects energy level.

Figure (11) Structure of Robot [8]

9.1 Description of Antibodies The prepared competence module is antibody. The important thing for immunobot is to select the best antibody for antigen and such is dependent on the how the antibodies are described. The selection should be made in bottom-up manner with proper communication between the modules. The structure of paratope and epitope is crucial for specificity or we can say for identity of any specific antibody.

Paratope is desirable condition and the epitope is disallowed condition. The paratope and idiotope are divided into three positions: obstacles, direction and distance. The typical inference/consensus system adopt a condition-action description just like in fuzzy inference and

40

the proposed system uses condition-action-condition manner. Such manner provides decentralized dynamic inference in a bottom-up manner.

Figure (12) Antibody Description [9] The prepared antibody for antigen can be like below: The antibody is activated if the immunobot detects the food in the front direction and mid-range, and makes the immunobot move forward to pick it up.

Figure (13) Prepared Antibody [9]

41

However, if a predator exists in front and near/mid range, or if a food is in near range, the prepared antibody can hesitate to be activated. On similar lines, the other antibodies are designed.

9.2 Dynamics In this model, the authors allow only one antibody to get activated when it surpasses the prespecified threshold. One state variable is introduced in terms of concentration of each antibody.

(23)

= concentration of antibody that varies with time. =matching ratio between antibody i and j.

9.3 Basic mechanism of the proposed inference making network Four antigens are listed in the figure shown and the listed five antibodies mainly participate in the inference/consensus making. For instance, antibody 1 means that the food is detected by immunobot in far range in front direction and so it is allowed to move forward. Other situations involve immunobot identifies food in near range/predator in front/high energy level, this antibody would stimulate other antibodies whose paratopes displays such conditions.

42

Figure (14) Antibody Selection [7,9] Consider current energy level high, the antibodies 1, 2, 3, and 5 are stimulated by the antigen. The concentrations of these very antibodies are incremented in accordance to its antigen. The interaction within immune networks antibodies is importan. In the end, antibody 5 is selected in figure 9. In the case of current energy level low, antibody 3 is selected [9].

10. Latest Immune Models and Hybrid Approaches


10.1 Danger Theory based algorithms In 2002, Aickelin and Cayzer include the following aspects in their AIS from danger theory: 1. Appropriate number of APC to display danger signals needs to be modeled. 2. Danger signal is either positive or negative, representing the presence or absence of the signal.
43

3. So far as biology is concerned, the danger zone is spatial but in computation model the other notions such as temporal proximity is used. 4. Sometimes the killer cells causes self cell death, this should not generate other danger signals. 5. Priming killer cells should be considered via APCs in AIS models 6. Antibody migration rule should specify the concentration of antibodies receiving signal 1 and signal 2 from a given APC. DT depends on the concentration so different immune cells.These aspects are used to build better AIS for anomaly detection in which the non-self do not trigger immune response without danger signal [7].

Figure 15 (a) One Signal Model [7]

Figure 15 (b) Two Signal Model [7]


44

Figure 15 (c) APC controlling IR [7]

Figure 15 (d) INS with third signal [7]

Figure 15 (e) danger in control through zoning[7]

Figure 15 (f) Control through INS and zoning [7]

45

In 2010, the online supervised two-class classification problem was attempted to solve by using danger theory. The proposed method is described below: The algorithm regarding the proposed method are as follows:

Algorithm 1 Danger theory based immune algorithm. 1. Introduce antibody population and memory

2. While stopping conditions are not met do 3. For i=0 to antigen population do 4. Present antigen to the system 5. Now the danger is created by antigen presented 6. General antibody population receives signal 0 from antigen presented 7. General antibody population receives signal 1 from danger zone 8. Antibodies that receives both 0 and 1 signals are selected 9. For all antibodies belonging to stimulated antibodies 10. Change the status of antibodies 11. Now the calculate the interaction between antibody and antigen 12. End for

46

13. Suppress antibody population 14. Decrease the danger from the antigen which has been already considered 15. For all antibodies belonging to stimulated antibodies 16. If the antibodies stimulation reaches certain threshold value then 17. Apply clonal selection algorithm 18. End if 19. End for 20. End for 21. Check the stopping criteria 22. End while 23. Output is the memory of antibodies selected via clonal selection and met threshold value

When the learning algorithm is ended, the output antibodies are used to classify for unknown antigens. A simple process in which an unknown antigen will be classified as the same class as the antibody with which it has the very low affinity.

Learning Algorithm explained: 1. Initialization: The above algorithm mentioned starts with the antibody random population and they are assigned labels. Their status are set to zero and memory are set to empty set. 2. Two kinds of signals: The detection of danger signals are co stimulation signal which are termed as 1 while other are termed as 0. The antibodies populations are divide in to
47

two parts; a] general and b] memory. The memory antibodies are not interested in reaction with antigens. They are the fixed memory of antigens. They are changed only when they are suppressed. The general antibodies get signal 0 when presented with antigen. So, the antibody can detect the stimuli of current antigen and when signal 0 is perceived only when danger zone is created. The antibodies receiving both signals are stimulated and can change their status.

Algorithm 2 1. Antibody stimulated = antibody stimulated +1.

2. If antibody label == antigen label then

3. Antibody-Antigen reaction =1

4. Else

5. Antibody-Antigen reaction = -1

6. End if 7. Antibody relevance = antibody-relevance + antibody antigen reaction 8. Variable danger zone (var) = affinity between antigen and antibody 9. Calculate the antibody stimulation = antibody +antibody - antigen reaction * var 10. Var = stimulated antibody population 11. Antigen danger = Var *var*antibody stimulation
48

Algorithm 3 1. If antibody stimulation (as) < threshold value (t) then

2. Delete antibody population that are less than threshold 3. Else if as <= t then 4. Antibody label = -1 * antibody label 5. Antibody stimulation = 0 6. as =0 7. else 8. clone antibody that has high affinity 9. then perform mutation process for cloned antibody 10. now convert the antibody to memory antibody 11. end if

Algorithm 4 1. Randomly pair the stimulated antibodies

2. For all pairs do 3. Calculate probability p1 4. If random < p1 then


49

5. Delete the antibody with high interactivity 6. End if 7. End for 8. Group the memory antibodies in to pairs 9. For all pairs do 10. Calculate probability p2 11. If random< p2 then 12. Remove the memory antibody with high affinity 13. End if 14. End for

10.2 Combining Dendritic Cells and Danger Theory In 2007, Yeom used a approach of mixing DT and DC to form model for signal precategorization. The following are principles: 1. Pathogens associated molecular proteins (PAMPs) are expressed by bacteria that can be identified by DCs for change in behavior.

2. Danger signals are generated by unplanned death of necrotic cells. The sudden and bizarre or chaotic death of internal components of cell causes danger signal to surface up. DCs are sensitive to concentration of danger signals. The presence of danger signal may or may not show change but the probability of change is higher than the normal situations.
50

3. Safe signals are due to normal death of any cell for regulations reasons and the tightly controlled process results in the release of various signals into the tissue. Such safe signals give rise to suppression signals.

4. Inflammatory cytokines can be released as a result of injury, although the process of inflammation is not enough to stimulate DCs alone. DCs can stimulate nave T cells and have number/ of functional properties (Yeom, 2007) DCs first function is to inform immune system to respond when there is attack.

DCs perform different functions depending upon their state of maturation. Modulation between these state is facilitated by identification of signal between tissues, namely, danger signal, apoptotic signal and inflammatory signal.

In tissue, DCs collect antigen and experience danger signals from necrosing cells and safe signals from apoptotic cells. Maturation of DCs occurs in response to the receipt of these signals.

According to Yeom (2007), if there is concentration of danger signals in the tissue at the time of pick of antigen, the DC is fully matures. Conversely, if there is safe signal, then DC gets matured differently [7].

10.3 Multilevel Immune Learning Algorithm (MILA) Both T and B level recognition mechanism is used in this algorithm. It is inspired by the communication and processes of T-cell dependent humoral immune response. In biological immune system, B- cells recognize antigen through immnoglobin receptors on their surfaces but they are not proliferate and differentiate until the green signal is given from Th cells.

51

For Th cells to allow B cells to proliferate and differentiate, Th cells should get stimulated and that happens only when Th cells recognize antigens in the context of major histocompatibilty complex (MHC). Suppression of B cells also occurs due to suppressor T cells. The activated B and T cells move to lymph nodes where they proliferate, mutate, select, differentiate, and death of B cell takes place in germinal centres (GCs). In MILA, an abstraction of above events is incorporated to develop detection algorithm. The algorithm consists of initialization, recognition, evolutionary and response. In initialization phase, the detection system is trained to recognize the self. The result of initialization is used to produce detectors, similar to populations of Th, Ts, Bcells which participate in immune response (humoral). There are three level :

1. APCs level, that corresponds to highest one. 2. B-cell level, the intermediate one. 3. Th- cell level, bit level for local patterns.

MILA use rcb-matching rule for real valued representations. A Th cell uses the slide window to get the w elements. However, B cells uses randomly chosen w elements. The concept of prematuration and crossover operators can be used.

The another feature of MILA is positive selection by Ts cells that are based on self samples.

An evolutionary phase in MILA is a process of refining the detector set if the earlier detection rates can be evaluated. This phase involves cloning, mutation, and selection; however cloning in MILA is targeted one only those detectors that are activated in the recognition phase can cloned [7].
52

10.4 Combining Negative Selection and Classification technique In anomaly detection technique, only positive samples are available (self-sample) at the training stage. However, most conventional classification algorithms need noth self as wells as non-self algorithms. In order to allow conventional algorithm to be used, when only self samples are there, a hybrid algorithm is proposed by Gonzalez (2002) which is used to create synthetic samples from a set of self-samples. The algo develop the detector set that covers the non-self space using NS and then points are used to generate the samples for non-self class allowing the use of conventional algorithm useful.

Figure (16) NS-SOM in generation classifier dataset [7] . Particularly, negative samples are generated from positive samples. Then samples from the both classes are used for neural network for self organizing map (SOM). An SOM, composed of nodes or neurons (that are able to identify input type) , is a type of AIN that is trained to produce

53

a low-dimensional representation of the input space or self/non-self feature space of the training samples called map. [7,8].

In order to allow conventional algorithm to be used, when only self samples are there, a hybrid algorithm is proposed by Gonzalez (2002) which is used to create synthetic samples from a set of self-samples. The algo develop the detector set that covers the non-self space using NS and then points are used to generate the samples for non-self class allowing the use of conventional algorithm useful.

The three phases of NS-SOM are shown in figure below:

Figure (17) NS-SOM Model Structure [7,8]

54

11. Immune Networks and Negative Selection Based algorithm


The mixture of Negative selection and Ab-Ab communications algorithm was developed for navigation control and path mapping of autonomous mobile robot by Prashant Rao (2008) for Khepera II robot.

The following is the step by step formulation of the algorithm:

1. Initialization: First initialize a network of immune cells (there is superset of 64 antibodies from 0 to 63). The initial concentrations of antibodies are initialized and the robot is reset. The subset of 20 antibodies is chosen randomly. The stimulation and suppression between antibodies using basic matching function is defined. The first two sensors are not ON in their Khepera II robot

2. Population Loop: i) Antigenic Recognition: The information from the sensors is collected and an antigen is formed based on that information. The matching is determined between antigen and randomly selected antibodies and affinities are allotted. Each antigen stimulates many antibodies but only one is perfectly matched and so selected for process.

ii)

Self-Nonself Determination: The antigen is seen for matching to self set in case innate memory takes over and system is allotted standard solution and the loop executes again OR the system moves on to next step.

iii)

Network Communications: The interactions between different selected randomly antibodies is calculated.

iv)

Dynamics: The stimulation minus suppression added to affinity between antibodies subtracted from the natural death co-efficient gives over all stimulation
55

of the system. The product from the stimulation and concentration of antibodie provides us with the rate of change of concentration with time. The antibody with high concentration is sent to critic that rewards or penalize and in respect to this affinity are modified.

3. Feedback: The penalty allotted T-cell helper is activated and its calculation is determined at each step. Adaption function is determined by interaction between T-cell and other cells in network by modifying the affinities between antibodies employing a suitable learning rate. 4. Step 2 and 3 are repeated until convergence criteria is met.

Figure (18) Algorithm based on Negative selection and Ab-Ab interaction [6]
56

Figure (19) Algorithm based on Negative selection and Ab-Ab interaction [6]

11.1 Latest Dendritic Cell Algorithm Inspired from Danger Theory Danger theory states that the dangers signals are generated to activate APCs. APCs stimulate T-helper cells and which finally gives rise to adaptive immune response. The danger signals are detected by dendritic cells which acts in three modes namely immature, mature and semimature. If the signal detected is safe then the dendritic cell become immature upon presenting
57

antigen to T-cell. If the dangerous signal is found then the dendritic cell is matured and T-cell become antigen reactive.

The dendritic cell algorithm takes into consideration safe, danger and PAMPs signals. [11]

ALGORITHM:

input

: S = set of data items to be labeled safe or dangerous

output :D = set of data items labeled as safe ordangerous.

Start Generate initial population of dendritic cells (DCs), D Create a set to include the migrated DCs, M

forall

items in set S do Select a set of DCs by randomly selecting from D, P forall DCs in set P do Add data item to DCs collected list Update safe, danger and PAMPs concentrations Update cytokiness concentration Move DC from D to M and generate a new DC in set D if the concentration is above threshold. stop

stop forall data items in S do count the number of times data item is presented by a mature and semi-mature DC Label item to be safe if if presented by more than semi-mature DCs than mature DCs, Add data item to labeled set M Stop [11]

58

11.2 Latest TLR (toll-like receptor) Algorithm Algorithmic steps of TLR algo as described by Aickelin and Greensmith (2007) which is simply designed for anomaly detection in computer networks are as below: 1. Collect set of system calls that are made in training data 2. Collect signal values correspondingly 3. Determine the complement set of sets in step 1 and step 2.

Figure (20) Systematic Overview of TLR algorithm [7]

59

4. Generate immature DCs (iDCs) set with signal receptors selected randomly from the complement signal set and with antigen receptors randomly selected from the complement system call set.

5. Similarly, generate nave T-cells (nTCs) with antigen receptors randomly drawn from complement system call set.

6. Immature DCs are exposed to sample signals and antigens, respectively.

7. If iDCs matches the signal. it matures (mDCs) and migrates.

8. If an iDC do not migrate in its lifetime, it is semi mature DC (smDCs) and then it migrates.

9. Migrated smDCs and mDCs present their antigen and try and match nTCs.

10. If mDC presenting antigen matches to nave T cell, then nTCs are activated and it is said that we have anomaly.

11. If smDC expressing antigen matches nTC , then it kills nTC to lower false positives.

12. Migrated smDCs and mDCs and killed nTCs are replaced by new cells as per steps 4 and 5. [7]

60

12. Recent Developments and Real world Applications


Solving problems using Immunological Computation In order to apply the knowledge of biological immune system to real world problems, one must first select the immune algorithm depending on the type of problem. The first step is to identify the elements involved into the problem and how they can be represented in terms of particular AIS. To encode such entities, bit-string, real valued, etc, representation approaches can be chosen. Then the affinity determination measure is selected related to matching rules employed. Next step is to decide which AIS is beneficial to create a set of suitable entities that can provide a good solution to the problem in the context [7].

Figure (21) Problem Solving Using AIS [7]

61

12.1 Virus Detection Kephart(1994) proposed immunologically inspired approach to detect viruses in computer system. In this, known viruses are identified by their computer coded sequences and unknown viruses are detected by their unusual behavior in the system. The virus detection software continuously scans the system to detect the changes. These changes triggers the release of decoy programs whose sole intention is to become infected by virus [7].

Figure (22) Flow Diagram for Khephart approach for virus detection [7]
62

A diverse suit of decoy programs are kept at different locations in the systems memory to detect virus. If one or more decoy programs are modifies, then it is sure that the virus has entered the system and each decoy program contains the sample of virus. The infected decoy programs are processed by signature extractor to generate the recognizer for the respective virus.

The signature extractor also extracts the attachment pattern of virus to the host in order to repair the host in case. The signature extractor also must select the virus signature so that it can avoid false-positives and false-negatives. The signature must be found in each sample of virus and it is very likely not to be found in uninfected programs in computer system. Once the best possible signature is found from virus infected programs, it id compared with half-gigabyte corpus of legitimate programs to make sure that there is no false-positive. The repair information is checked by testing on samples of the virus and again by human expert [7].

12.2 Immunogenetic Approaches in Intrusion detection Gonzalez (2002) proposed negative selection with detector rules to detect attacks by monitoring network traffic. A real valued representation is used for evolving hyper-rectangular shaped detectors, interpreted as if-then rules, for high level characteristics of self / non-self space. The experiments were performed using data from 1999 Defense Advanced Research Project Agency intrusion detection evaluation dataset. AIS approach was able to produce detectors that gave a good estimation of the amount of deviation from the normal [7].

12.3 Danger theory in Network Security Aickelin (2002) first proposed danger theory application to network security. Their system behaves like DCs looking for danger signals just like impulse increase in network traffic or abnormally high flow of error messages. If such signals goes above threshold, then an alarm is raised [7].
63

12.4 Robotics and Control Robot controlled by Ishiguro et. al. (1996, 1998) , Wantanabe et. al. (1998, 1999) and Lee et. al. (1999) focused on the development of dynamic decentralized consensus-making mechanism based on the immune network theory. In dynamic environment, the immunoid is able to collect the garbage. The metaphor of antibodies, which were potential behaviors of immunoid ; antigens were related to environmental inputs just like garbage, wall, home base. For the immunoid to take decide to the best, it matches antigen to antibody [7].

Vertebrate immune systems are inspiration for computer scientist and engineers to create new algorithms in order to solve real world problems, four main AIS algorithms are: 1. Negative selection algorithms 2. Artificial immune networks 3. Clonal selection algorithm 4. Danger theory and dendritic cell algorithm

The recent development include AIS application in computer security, optimization, data mining, fault detection, etc. Many authors have explained the recent developments in AIS just like Garret (2005) who tried to deal with the development before 2005 and attempt to make evaluation of AIS in criteria of distinctiveness and effectiveness. Hart and Timmins (2010) discussed application of AIS and proposed a set of problems features for the heavy applications of AIS. Some of the recent developed models and Hybrid approaches are explained below:

12.5 Conserved Self Pattern Recognition Algorithm (CSPRA) This very algorithm is recent algorithm in AIS area with an inspiration from Pattern Recognition Receptors Model (PRR). According to PRR Model, the self/nonself discrimination requires
64

stimulation from APC. On the other hand if one sees, APCs are not stimulated until and unless they are activated via PRR that identify molecular patterns on bacteria. So, for sure, the PRR model added additional layers of molecular patterns. CSPRA (2010) naturally include negative selection algorithm and the anomaly detection in CSPRA is performed by combining the results from APCs self pattern recognition and T-cell negative selection. Self pattern recognition by APCs is not done till antigen is not detected by T-cell negative selection algorithm. The generation of APC detector includes two major steps:

1. Depending on the function between antigen and its feature space, we define the conserved self pattern that can be pre-defined from the data. This very data includes the empirical one from the laboratory or it can be calculated mechanically by using Pearsons co-efficient values between the coloum of each attribute and their respective label.

2. By evaluating the maximum, minimum and mean of all the values in the features space of loc1, loc2,..,generate APC detector R = {(loc1, min, max, mean), (loc2, max, miin, mean)..} within the conserved self pattern of features located in loc1, loc2.. As compared to classical negative selection algorithm, the proposed and tested CSPRA Algorithm shows more better and promising results reducing the number of false errors without increase the complexity. [3, 4, 13]

12.6 Recent Complex Artificial Immune Systems (CAIS) CAIS consisted of five encountered layers namely encounter layer, preprocessing layer, MHC layer, competitive layer and stimulation layer. Antigen and Antibody are termed as the input and output. Suppose an antigen is encountered by the system then there are two ways in which we can recognize it. One is in which B cell direct recognition and the other way is through the APC layers. The input is given to APC layer, then the molecular complex pattern formed is given to MHC layer for processing. The information coming from APC is transformed and translated into MHC and feed to Th layer. In this Th layer, the cells receive different responses from MHC layer and develop a set that consists of Th cells that provide better response to input antigens. B-cells
65

become activated due to stimulation from Th layer and also by input pattern. An antibody is the difference between an input and weights associated with b cells. Ts cells modulate the weights associated with immune cells located in neighborhood set. As compared to binary immune systems, the CAIS has invariant feature to recognize translation, rotation and scale patterns. It can be applied to hand writing pattern recognition problem [11, 13].

12.7 Hybrid Approaches BAIS (Bayesian Artificial Immune Systems) is developed by removing the mutation and cloning operators from the probabilistic model for solving the optimization problems and multiobjective optimization. BAIS is capable of capturing the most relevant interactions between the problem variables. The very algorithm adopts the population based strategy for search and Bayesian network for implementing the probabilistic model. Once the population is initiated, the algorithm starts the loop with stopping condition and the following steps are evaluated for loops:

a. Using proper selection technique, select the best population from the given set. b. For the best solutions, develop the Bayesian networks that best fits to the selected best solutions. c. Sample the antibodies d. Remove the antibodies with lower fitness and so the similar ones in the criteria e. Now put randomly generated antibodies in the selected ones to maintain diversity [13].

BAIS can be applied for feature selection using wrapper approach. It has the ability to handle the building block in optimization of Trap-5 and such building blocks are non-overlapping and overlapping. The multi objective Knapsack optimization can also be solved very efficiently by BAIS algorithm. Such a approach is termed as the Multiobjective Bayesian Artificial Immune Systems (MOBAIS) that can be applied for classification problems. It is capable of identifying
66

and preserving the building blocks effectively while it can search and find diverse high leve; local optimal. The practical application shows that it has parsimonious results and thus shows accurate results. Furthermore the Bayesian networks were enhanced by learning to avoid the synthesis of the network at each iteration and only update those two parameters that are crucial for example the conditional and marginal probabilities at each iteration [13]. The algorithm with an unstructured damage classification based on the data clustering and AIS pattern recognition can be performed. Such a technique uses the data clustering training data to a specified number of clusters and generate the initial memory cell set. By combining Afor example.IS pattern recognition algorithms, this algorithm for the evolution for memory cells. AIS with SVM can be used for fault diagnosis of induction motors. AIS is used for tuning the parameters of kernel and penalty for classification accuracy.

In immune multiagent recognizer, each agent recognizer is an immune RBF neural network model. In the immune RBF neural network model, antigen is input and the antigens are the compression cluster mapping that is the hidden layers. The output weight can be determined by using least square algorithm. In this algorithm, each level of recognition systems contain recognizer that can recognize a sort of antigen. A multiple valued immune network classifier (MVINC) based on immune netwotk theory was applied for remote sensing images and performing immune memory using logic theory and immune theory for classification.

EaiNET combined the AIS and Particle Swarm optimization which uses the learning technique of PSO which is nothing but the each individual is able to learn the best from the social population on account of which the convergence rate increases. Radial Basis Function (RBF) artificial neural network and AIS are combined for compression of the data in the set. Such a tool is called as aiNET. This can also be used for determining the number of RBF in ANNs and thus termed as RBFNN.
67

A fault diagnosis model was proposed based on the immune evolution algorithm. The design part includes the diversity evaluation that is very complex and fault detection is hard, fault calculation technique integrated the induction and static was designed [13]. Particularly, by combining the agent based modeling and UML, the computational properties of degenerate recognition systems are investigated. In this, It is possible to determine the

degenerate receptors and that when compared to a non degenerate system, recognition appears quickly.

In the resource limited AIS, the Network Affinity Threshold (NAT) does not calculate the network evolution process because the network granularity is determined by NAT and the initial value is calculated by distance between the antigens. The convergence of the public and the stability can be impaired by pure clonal selection and random change operation. The gene immune detection algorithm with complement operator decreases effectively false position surfaced up in the previous gene immune detection. Also the vaccine and the complement are introduced. The number of detector are reduced and the efficiency of detection is increased. The complement operator overcome the defect of the gene immune algorithm and the detection time can be increased drastically. ICAIS for incremental clustering based on the principles of AIS was introduced and it implements incremental clustering and uses the basic immunity response to determine the data regarding to novel clusters and it also uses the secondary immune response to identify the data to old patterns [13]. Based on Learning Vector Quantization (LVQ) and immune network [13] model that is an extension to the basic Jernss Model was proposed that can be used for pattern recognition. The new classification Hybrid Fuzzy Neuro- Immune Network method based om Multi Epitope approach. The performance of the proposed method shows promising result in terms of pattern recognition.

68

APPENDIX A

Pattern Recognition in the Immune System using a Growing SOM [ The following project is taken from Ph. D Thesis of Leonardo De Castro ]
function [w,win,cwin,D] = abnet(ag,eps,comp,alfa,beta,pc,pm), % % % % % % % % % % % % % % % % % % % % % Pattern Recognition in the Immune System using a Growing SOM Bipolar Splitting/Pruning Self-Organizing Feature Map (GSOM) with Evolutionary Phase Main features: bipolar weights, Hamming Distance, Winner takes all PHASE I: Growing followed by Pruning PHASE II: Supervised Evolution function w win cwin D ag eps comp alfa beta [w,win,cwin,D] = hybrid(ag,eps,comp,alfa,beta,pc,pm), -> weight matrix (Ab population) -> winner for each Ag (v) -> amount of winning of each individual (tau) -> hamming distance of each Ag with relation to its mapped class -> antigen population to be recognized (n2xs2) -> ball of stimulation -> comparison: 1 for comparing complementary chains 0 for comparing identical chains (Hamm. dist.) -> amount of bits to be changed -> number of iterations for reducing the learning rate

Auxiliar functions: COVER, UPDATE, SPLIT, PRUNE, MATCH, CADEIA, TESTGSOM The columns of w must be similar to each Ag

if nargin == 2, [n2,s2] = size(ag); comp = 0; alfa = 3; beta = 3; pc = 0.6; pm = 0.1; end; % Network parameters ep = 0; alfa0 = alfa; TD = 1; [np,ni] = size(ag); no = 1; vep = [0]; [C,maxno] = cover(ni,eps); vno = [1:1:no]; disp(sprintf('Coverage of each Ab: %d',C)); disp(sprintf('Initial number of classes: %d',no)); disp(sprintf('Possible number of classes: %d',maxno)); if maxno > np, maxno = np; disp(sprintf('Maximum number of classes (N): %d',np)); end; % disp(sprintf('Affinity threshold: %d',eps)); disp(sprintf('Press any key to continue...'));

69

pause; [w] = cadeia(ni,no,0,0,1); max_ep = (beta + 1) * maxno; % Network Definition while (ep < max_ep & TD > 0)% & no < maxno), cwin = zeros(1,no); k = 0; vet = randperm(np); % Assincronous while k < np, k = k+1; i = vet(k); D = []; [D,mXOR] = match(w',ag(i,:),comp); [v(k),ind] = min(D); cwin(ind) = cwin(ind) + 1; win(i) = ind; w = update(w,ind,alfa,mXOR(ind,:)'); end; TD = sum(v); ep = ep + 1; % Growing Phase if (rem(ep,beta)==0), [w,no,alfa] = split(cwin,win,w,ag,eps,alfa,alfa0); vno = [vno no]; vep = [vep ep]; end; % Pruning Phase [aux,indmin] = min(cwin); if aux == 0, [w,no,alfa] = prune(w,indmin,alfa0); vno = [vno no]; end; % Learning rate decreasing if (ep > 0.05*max_ep & rem(ep,0.05*max_ep)==0), if alfa > 1, alfa = alfa - 1; end; end; disp(sprintf('IT: %4.0d no: %d LR: %d TD: %d',ep,no,alfa,TD)); end; [v,win,cwin,perc] = testgsom(w,ag,eps); disp(sprintf('Percentage of misclassified Ag: %3.2f%%',perc)); disp('Minimal Antigenic Affinity (HD)'); disp(v); disp('Concentration Level: '); disp(cwin); disp(sprintf('Final Architecture: [%d,%d].',ni,no)); figure(1); plot(vep,vno); hold on; plot(vep,vno,'or'); axis([0 ep+1 0 no+1]); title('Growing Evolution');xlabel('Iteration'); hold off; % --------------------------- % % INTERNAL SUBFUNCTIONS % % --------------------------- % % Function CADEIA function [ab,ag] = cadeia(n1,s1,n2,s2,bip) if nargin == 2, n2 = n1; s2 = s1; bip = 1; elseif nargin == 4, bip = 1; end;

70

% Antibody (Ab) chains ab = 2 .* rand(n1,s1) - 1; if bip == 1, ab = hardlims(ab); else, ab = hardlim(ab); end; % Antigen (Ag) chains ag = 2 .* rand(n2,s2) - 1; if bip == 1, ag = hardlims(ag); else, ag = hardlim(ag); end; % End Function CADEIA

% Function SPLIT function [w,no,alfa] = split(cwin,win,w,ag,eps,alfa,alfa0) [ni,no] = size(w); [ind] = find(cwin > 1); % which outputs map more than one Ag if ~isempty(ind), [val,out] = max(cwin); % out = ind(1); v = find(win==out); Mag = ag(v,:); % matrix of ag mapped in the same output D = match(Mag,w(:,out)',0); [aux,new] = max(D); if aux > eps, disp('** Growing **'); if out == 1, w = [Mag(new,:)',w]; elseif out == no, w = [w,Mag(new,:)']; else, w = [w(:,1:out),Mag(new,:)',w(:,out+1:end)]; end; no = no + 1; alfa = alfa0; end; end; % End Function SPLIT % Function TESTGSOM function [v,win,cwin,k] = testgsom(w,ag,eps), % disp('** Running the trained network **'); [np,ni] = size(ag); k = 0; cwin = zeros(1,size(w,2)); for i=1:np, [D] = match(w',ag(i,:),0); [v(i),ind] = min(D); win(i) = ind; cwin(ind) = cwin(ind) + 1; end; k = 100 * (sum(v > eps) / np); % End Function TESTGSOM

71

% Function PRUNE function [w,no,alfa] = prune(w,ind,alfa0), [ni,no] = size(w); disp('** Pruning **'); if ind == 1, w = w(:,2:no); elseif ind == no, w = w(:,1:no-1); else, w = [w(:,1:ind-1) w(:,ind+1:no)]; end; no = no - 1; alfa = alfa0; % End Function PRUNE % Function COVER function [C,no,eps] = cover(len,eps), fat = fatorial(len); C = 0; while eps > len, disp(sprintf('Ball of stimulation bigger than chain length %d',len)); eps = input('Enter a new ball of stimulation: '); end; for i=0:eps, C = C + (fat/(fatorial(i) * fatorial(len-i))); end; no = ceil((2^len)/C); % End Function COVER

% Function FATORIAL function fat = fatorial(m); if m == 0, fat = 1; elseif m < 0, disp('Negative value'); else, fat = prod(1:1:m); end; % End Function FATORIAL

% Function UPDATE function [w] = update(w,ind,alfa,vXOR), [ni,no] = size(w); for j = 1:alfa, [val,pto] = max(vXOR); if val == 0, break; % exit loop if vectors are equal end; w(pto,ind) = -1 * w(pto,ind); vXOR(pto) = 0;

72

end; % End Function UPDATE

% Function MATCH function [ms,mXOR] = match(ab,ag,comp) if nargin == 2, comp = 0; % Hamming distance end; msc = []; % ms complement % Converting bipolar (-1,+1) strings to binary ones (0,+1) ab = hardlim(ab); ag = hardlim(ag); % Using the XOR operator for calculating the match score [n1,s1] = size(ab); ag = ones(n1,1) * ag; % Multiply the Antigen mXOR = xor(ab,ag); ms = sum(mXOR'); msc = 1 - ms; if comp == 1, ms = msc; end; % End Function MATCH

clear all alfa = 3; comp = 0; beta = 3; pc = 0.6; pm = 0.1; eps = 15; ag=[1 2 3 9 8 7 6 5 3 4 5 6 7 8 9 9 5 3 2 1 4]; w = abnet (ag,eps,comp,alfa,beta,pc,pm); OUTPUT: Coverage of each Ab: 2069256 Initial number of classes: 1 Possible number of classes: 2 Maximum number of classes (N): 1 Press any key to continue... IT: 1 no: 1 LR: 2 TD: 13 IT: 2 no: 1 LR: 1 TD: 10 IT: 3 no: 1 LR: 1 TD: 8 IT: 4 no: 1 LR: 1 TD: 7
73

Percentage of misclassified Ag: 0.00% Minimal Antigenic Affinity (HD) 6 Concentration Level: 1 Final Architecture: [21,1]. w = 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 1 1 -1 -1 -1 -1 -1

Growing Evolution 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0

0.5

1.5

2.5 Iteration

3.5

4.5

74

Conclusion: The immune system is a remarkable information processing and self learning system that offers inspiration to build artificial immune systems (AIS). The field of AIS has obtained a significant degree of success as a branch of Computational Intelligence since it emerged in 1990s. It has been revealed that research is centered on four majors AIS algorithms: 1. Negative selection 2. Artificial immune networks 3. Clonal selection algorithm 4. Danger theory and dendritic cell algorithms. However, the other aspects of the biological immune systems are motivating

computer scientists and engineers to develop new models and problem solving methos. Though an extensive amount of AIS applications has been developed, the success of these applications is till limited by the lack of any exemplars that really stand out as killers AIS application

75

Potrebbero piacerti anche