Generalization Regions in Hamming Negative Selection Thomas Stibor 1 , Jonathan Timmis 2 , and Claudia Eckert 1 1 Darmstadt University of Technology Department of Computer Science Hochschulstr. 10, 64289 Darmstadt, Germany 2 University of York Department of Electronics and Department of Computer Science Heslington, York, United Kingdom Abstract Negative selection is an immune-inspired algorithm which is typically applied to anomaly detection problems. We present an empirical investigation of the generalization capability of the Hamming negative selection, when combined with the r-chunk aﬃnity metric. Our investigations reveal that when using the r- chunk metric, the length r is a crucial parameter and is inextricably linked to the input data being analyzed. Moreover, we propose that input data with diﬀerent characteristics, i.e. diﬀerent positional biases, can result in an incorrect generaliza- tion eﬀect. 1 Introduction Negative selection was one of the ﬁrst immune inspired algorithms proposed, and is a commonly used technique in the ﬁeld of artiﬁcial immune systems (AIS). Negative selection is typically applied to anomaly detection problems, which can be considered as a type of pattern classiﬁcation problem, and is typically employed as a (network) intrusion detection technique. The goal of (supervised) pattern classiﬁcation, is to ﬁnd a functional map- ping between input data X to a class label Y so that Y = f (X ). The mapping function is the pattern classiﬁcation algorithm which is trained (or learnt) with a given number of labeled data called training data. The aim is to ﬁnd the mapping function, which gives the smallest possible error in the mapping, i.e. minimize the number of samples where Y is the wrong label ( this is espe- cially important for test data not used by the algorithm during the learning phase). In the simplest case there are only two diﬀerent classes, with the task being to estimate a function f : R N →{0, 1}∋ Y , using training data pairs generated i.i.d. 1 according to an unknown probability distribution P (X, Y ) (X 1 ,Y 1 ),..., (X n ,Y n ) ∈ R N × Y, Y ∈{0, 1} such that f will correctly classify unseen samples (X, Y ). If the training data consists only of samples from one class, and the test data contains samples from two or more classes, the classiﬁcation task is called anomaly detection. 1 independently drawn and identically distributed