Proceedings of the International Multiconference on Computer Science and Information Technology pp. 25–30 ISBN 978-83-60810-22-4 ISSN 1896-7094 Minimizing overlapping regions for double detectors approach Andrzej Chmielewski Faculty of Computer Science Bialystok Technical University Wiejska 45a, 15-351 Bialystok, Poland e-mail: achmielewski@wi.pb.edu.pl Slawomir T. Wierzcho´ n Institute of Computer Science Polish Academy of Sciences and Institute of Informatics, Gda´ nsk University, Poland e-mail: stw@ipipan.waw.pl Abstract—The quality of an immune-based negative selection algorithm hardly depends on quality of generated detectors. First, they should cover a nonself space in sufficient degree to guarantee high detection rates. Second, the duration of classification is proportional to the cardinality of detector’s set. A time reaction for anomalies is especially important in on-line classification systems, e.g. spam and intrusion detection systems. Therefore, detectors should be sufficiently general (to reduce their number), as well as sufficiently specific (to detect many intruders). In this paper, we present an improved approach using dou- ble, real-valued and binary, detectors, designed to meet above stated requirements. We consider two version of proposed algo- rithms, which differs from each other at the degree of allowed overlapping regions. However, what is confirmed by presented experiments, too aggressive minimization of overlapping areas can be, not only computationally complex, but it provides lower detection rates also. I. I NTRODUCTION N ATURAL Immune System (NIS) prevents living organ- ism against intruders called pathogens. It consists of a number of cells, tissues, and organs that work together to protect the body. The main agents responsible for the adaptive and learning capabilities of the NIS are white blood cells called lymphocytes. They differentiate into two primary types: B- and T-lymphocytes called also B- and T-cells for brevity. T-lymphocytes are like the body’s military intelligence system, seeking out their targets and sending defenses to lock onto them. Next, B-lymphocytes, destroy detected invaders to protect the body. The lymphocytes start out in the bone marrow and either stay there and mature into B-cells (this process is called affinity maturation), or they leave for the thymus gland, where they mature into T-cells in the process of negative selection. This process has inspired Forrest et al. [8] to formulate so- called negative selection algorithm (NSA). NSA is designed to discriminate between own cells (called self ) and others (called nonself ). To achieve this goal, first, detectors (a counterpart of T-lymphocytes) are generated. Freshly generated detector (usually, in random way) is added to the set of valid detectors only if does not recognize any self element. A nice feature of the NSA is that it does not need examples of nonself samples (counterpart of pathogens) to detect them. It is obvious, that the efficiency of NSA hardly depends on the quality of the detectors. Section II precises what the “quality” really means. This problem was widely discussed in [14], [13], and [4], where many interesting improvements are described. In these papers, however, only one type of detectors was considered: either binary (called b-detectors) or real-valued (v-detectors). In this paper we present a model, called bv model, containing both b- and v-detectors. This idea was already proposed in [5], but here we focus on the problem of finding efficient detectors that are as unique as possible. The paper is organized as follows. Section II presents short description of related works. Our approach is discussed in Section III. Section IV describes experiments and their results. Lastly, Section V concludes the paper and discusses possible applications of the algorithm in real-life problems. II. BACKGROUND Negative selection algorithm [8] relies on the assumption that an Universe U (the set of all possible molecules, or items relevant for a given problem) can be divided into two disjoint subsets: U = S∪N , S∩N = where, S represents a set of self molecules and N – a set of nonself molecules. The main task of NSA is generation of such a set of detectors D, which recognize as much as possible of nonself elements. Unrecognized nonselfs form so-called holes (H). Their existence is even desirable for anomaly detection systems, as they generalize normal behavior of a system under considerations – consult [12] for details. Obviously the detectors cannot recognize self molecules. Denoting r(d) the set of molecules recognized by the detector d we state these requirements as follows dD r(d) ≈N , (d D)[r(d) ∩S = ] In general, D consists of various detectors, which are more or less general. By “general” detector we understand such a detector that: (a) the set r(d) is as large as possible, and (b) d recognizes as small as possible elements recognized by other detectors, i.e. for any pair of detectors d i ,d j the cardinality of the set r(d i ) r(d j ) should be close to zero. Lastly, the number of the detectors in D should be as small as possible. 25