Permutation k-sample Goodness-of-Fit Test for Fuzzy Data Przemyslaw Grzegorzewski Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, 00-662 Warsaw, Poland; and Systems Research Institute, Polish Academy of Sciences, Newelska 6, 01-447 Warsaw, Poland pgrzeg@mini.pw.edu.pl Abstract—The problem of testing goodness-of-fit for k distri- butions based on fuzzy data is considered. A new permutation test for fuzzy random variables is proposed. Besides the general constrution of the test an algorithm ready for the practical use is delivered. A case-study illustrating the applicability of the suggested testing procedure is also presented. Index Terms—fuzzy data, fuzzy number, fuzzy random vari- able, goodness-of-fit test, permutation test, random fuzzy number, trapezoidal fuzzy number I. I NTRODUCTION Most of statistical procedures are constructed with fairly specific assumptions regarding the underlying population dis- tribution. In particular, one of most often used techniques, ap- plied for comparing several treatments, i.e. analysis of variance (ANOVA), assumes not only independence of observations and that all populations are normally distributed but also the homogeneity of their variances. Obviously, so strong assump- tions quite often are not satisfied. Unfortunately, ANOVA, like some other statistical tests is sensitive to violations of the fundamental model assumptions inherent in its derivation. In such case distribution-free methods, also called nonparametric, are very useful. In particular, the Kruskal-Wallis test could be used for comparing a few independent samples. This test, unlike ANOVA, requires neither normality nor homogeneity of variances and that is why it is sometimes called the nonparametric analogue of one-way analysis of variance. Real-life data sets often consist of imprecise or vague observations. In particular, many human ratings based on opinions or associated with perceptions lead to data that cannot be expressed in a numerical scale. Such data consist of intrinsically imprecise or fuzzy elements. Thus, if they also appear as a realization of some random experiment, we are faced with random fuzzy variables that cannot be analyzed by classical statistical methods and require another adequate approach. Several approaches have been developed in the literature for testing statistical hypotheses with fuzzy data. Depending on the context and whether data are perceived from the epistemic or the ontic view (see [4]), various test constructions appeared in the literature (for the overview we refer the reader e.g. to [8], [9], [12], [13]–[15], [17], [22], [26], [27], [28], [33], [32]). In particular, the problem of testing the equality of k samples against the so-called “simple-tree alternative” or “many-one problem” for fuzzy data based on the necessity index of strict dominance is considered in [17]. The bootstrap test for the equality of fuzzy means of k populations can be found in [9], while [33] contains the bootstrap procedure for testing the homoscedasticity of k populations. One may wonder why the nonparametric Kruskal-Wallis test has not been generalized for fuzzy data. The reason is that the Kruskal-Wallis test is based on ranks which cannot be determined for fuzzy samples since fuzzy numbers are not linearly ordered. However, the Kruskal-Wallis test is actually the k-sample goodness-of-fit test, since its null hypothesis states that all k samples under study actually come from the same distribution (which is equivalent to the statement that all k populations are identically distributed). Therefore, one may consider another construction of the k-sample goodness of fit test, which does not need any ranks. Such construction based on permutations is suggested in this very contribution. The paper is organized as follows: in Sec. II we introduce the notation and recall basic concepts related to fuzzy data modeling and operations on fuzzy numbers. Sec. III is devoted to fuzzy random variables. In Sec. IV we propose the general idea of the k-sample goodness-of-fit permutation test for fuzzy data. Besides the test construction we deliver testing algorithm ready for a practical use. Next, in Sec. V we adapt the general construction of the suggested test for the trapezoidal fuzzy numbers. Then we present some results of the simulation study (Sec. VI) and the case study (Sec. VII) with the proposed test. Finally, conclusions and some indications for the further research are given in Sec. VIII. II. FUZZY DATA A fuzzy number is an imprecise value characterized by a mapping A : R [0, 1] (called a membership function) such that its α-cut defined by A α = {x R : A(x) α} if α (0, 1], cl{x R : A(x) > 0} if α =0, (1) is a nonempty compact interval for each α [0, 1]. Operator cl in (1) stands for the closure. Thus every fuzzy number is completely characterized both by its memberschip function A(x) and by a family of its α-cuts {A α } α[0,1] . Two α-cuts 978-1-7281-6932-3/20/$31.00 ©2020 IEEE