Fast Antirandom (FAR) Test Generation Annneliese von Mayrhauser Andre Bai Tom Chen Charles Anderson Amjad Hajjar Dept. of Electrical Engineering Dept. of Computer Science Engineering Building University Services Building Colorado State University Colorado State University Fort Collins, CO 80523 Fort Collins, CO 80523 ph: 970-491-6574 ph: 970-491-7016 fax: 970-491-2249 fax: 970-491-2466 bai,chen,hajjar @engr.colostate.edu avm,anderson @cs.colostate.edu Abstract Anti-random testing has proved useful in a series of em- pirical evaluations. The basic premise of anti-random test- ing is to chose new test vectors that are as far away from ex- isting test inputs as possible. The distance measure is Ham- ming or Cartesian Distance. Unfortunately, this method es- sentially requires enumeration of the input space and com- putation of each input vector when used on an arbitrary set of existing test data. This prevents scale-up to large test sets and/or long input vectors. We present and empirically evaluate a technique to gen- erate anti-random vectors that is computationally feasible for large input vectors and long sequences of tests. We also show how this fast anti-random test generation (FAR) can consider retained state (i. e. effects of subsequent inputs on each other). We evaluate effectiveness using branch cover- age as the testing criterion. 1. Introduction Testing techniques employ a variety of mechanisms, au- tomated, tool assisted, and manual, for test generation. One of the techniques that has gained support and has shown to be useful in a series of empirical evaluations [6, 7] is anti- random testing. The basic premise of anti-random testing is that in order to achieve higher coverage (of whatever type) one should, after having exercised a set of tests, now choose tests that are as different as possible from the tests previously used. The distance measure is Hamming and Cartesian dis- tance. New test patterns are chosen that maximize this dis- tance. In previous analyses, this approach has improved code coverage for boundary conditions, and has proved more ef- ficient than random testing [6, 5]. The basic method has the following two disadvantages, when used on an arbitrary set of given test vectors: 1. The method essentially requires enumeration of the in- put space and computation of distance for each poten- tial input vector. This prevents scale-up to large test sets and/or long input vectors. Computations become too expensive. Enumeration is not required when start- ing from a single seed vector [6, 7], but this limits the applicability of the technique. 2. The input vectors on which the anti-random vectors are computed have to be binary. The current way around this problem is to use “checkpoint encoding” [7]. Non- binary inputs are grouped into partitions which are then given a binary encoding. This binary encoding is used for anti-random test generation. The anti-random vec- tors computed are mapped back into the actual input space by selecting from each of the partitions identified by the binary encoding. Unless the partitions are very small, this approach can lead to the problems identified by [3]. On the other hand, when we have many small partitions, the size of the input vector grows and com- putation becomes expensive, and quickly impossible. Our objective was to find a more efficient method to gen- erate anti-random test patterns that would be computation- ally feasible for large input vectors and long sequences of tests. This would enable a promising technique to be applied to larger problems.