FIPS 140-2 Statistical Test Suite Has In- appropriate Significance Levels Song-Ju Kim, Ken Umeno, and Akio Hasegawa Communications Research Laboratory 4-2-1, Nukui-kitamachi, Koganei-shi, Tokyo 184-8795, Japan; songju@crl.go.jp Abstract: We show that FIPS 140-2 statistical test suite does not have unique significance level, and also show that the runs test and the long-run test of this test suite have inappropriate significance levels for the further analysis such as the checking of the success rate which are used in NIST 800-22 statistical test suite. Introduction: In December 2002, the statistical test suite of the Federal Information Processing Standards Publication (FIPS) 140-2 for security requirements for crypto- graphic modules was eliminated in its documents. It is important to know why this test suite was eliminated in order to develop statistical tests for randomness. Some statistical tests are based on a statistical hy- pothesis H 0 that a given binary sequence was produced by a random bit generator. The test only provides P- value which is a measure of the strength of the evidence provided by the data against the hypothesis. The sig- nificance level α of the test of a statistical hypothesis H 0 is the probability of rejecting H 0 when it is true. If the significance level α of a test of H 0 is too high, then the test may reject sequences that were, in fact, produced by a random bit generator (such an error is called a Type I error). On the other hand, if the significance level α of a test of H 0 is too low, then there is the danger that the test may accept sequences even though they were not produced by a random bit generator (such an error is called a Type II error) [1]. The NIST (National Institute of Standards and Tech- nology) 800-22 statistical test suite, which includes 16 tests, adopts two further analyses in order to minimize the probability of accepting a sequence being produced by a good generator when the generator was actually bad [2]. We focus on the first analysis (the checking of success rate), and apply this analysis to FIPS 140-2 statistical test suite in this study. Checking of success rate: A set of sequences (sample size m) is subjected to the test, and the proportion of sequences whose cor- responding P-value satisfies P-value ≥ α (’success’) is calculated. If the proportion of success-sequences falls outside of following acceptable interval, there is evi- dence that the data is non-random. P ′ ± 3 P ′ (1 - P ′ ) m , (1) where P ′ =1 - α and m is the number of sequences. This interval is determined to be 99.73% range of nor- mal distribution which is an approximation of the bi- 0 2e+05 4e+05 6e+05 8e+05 1e+06 Number of Tests 0.9994 0.9995 0.9996 0.9997 0.9998 0.9999 1 Proportions Monobit Poker Long-Run Runs Figure 1: The convergent behavior of the success rate for AES. Dotted lines denote the provisional acceptable-interval if we set the unknown signifi- cance level α = 10 -4 in eq.(1) for all four tests. nomial distribution under the assumption that each se- quence is independent sample. Results: FIPS 140-2 statistical test suite has four statistical test for randomness, monobit test, poker test, runs test, and long-run test [3]. Instead of making the user se- lect appropriate significance levels for these tests, ex- plicit bounds are provided that the computed value of a statistic must satisfy, for example, 9725 < (the number of ones in 20000 bit sequence) < 10275 for the monobit test. In this test suite, we cannot check the success rate as long as the user does not know the significance levels. We investigated the convergent behaviors of the suc- cess rates for each test of FIPS 140-2 using Advanced Encryption Standard (AES - OFB mode with 128bit key). Figure 1 show the convergent behaviors of suc- cess rates of AES. Dotted lines denote the provisional acceptable-interval if we set the unknown significance level α = 10 −4 in eq.(1) for all tests. The horizontal axis denotes the sample size m. As we can see, the success rates become convergent to different values ac- cording to each test (the monobit test and the poker test are almost the same). However, these values are almost the same if we use another algorithm which is completely different from AES. This suggests that each convergent value denotes 1- [ each significance level ] for each test, and also suggests that FIPS 140-2 test suite does not have unique significance level. We can see that the monobit test and the poker test have almost the same significance levels which is very close to α = 10 −4 . In fact, we can calculate these sig- nificant levels mathematically. The significance level of the monobit test which is specified by the explicit bounds 9725 < (the number of ones in 20000 bit se- quence) < 10275 is derived from following formula, 2 √ π ∞ 275 100 exp(-X 2 )dX =0.0001006219. (2) We can also calculate the significance level of the poker test from the explicit bounds 2.16 <X< 46.17, where X = (16/5000) * ( ∑ 15 i=0 [f (i)] 2 ) - 500. f (i) is the number of occurrence of i-th configuration. It is 1