Using Rank-Order Geostatistics for Spatial Interpolation of Highly Skewed Data in a Heavy-Metal Contaminated Site Kai-Wei Juang, Dar-Yuan Lee,* and Timothy R. Ellsworth ABSTRACT Meirvenne et al., 1996). Logarithmic transformation is used for the data following a lognormal distribution and The spatial distribution of a pollutant in contaminated soils is then the lognormal kriging estimator can be used for usually highly skewed. As a result, the sample variogram often differs considerably from its regional counterpart and the geostatistical inter- spatial interpolation (Journel, 1980). The lognormal polation is hindered. In this study, rank-order geostatistics with stan- kriging estimator provides an approximately unbiased dardized rank transformation was used for the spatial interpolation estimate, although error estimations are often exagger- of pollutants with a highly skewed distribution in contaminated soils ated. The lognormal kriging estimator only works well when commonly used nonlinear methods, such as logarithmic and when the transformed data are a Gaussian random func- normal-scored transformations, are not suitable. A real data set of tion. For highly skewed data, it is necessary to check soil Cd concentrations with great variation and high skewness in a whether the univariate distribution of data is lognormal contaminated site of Taiwan was used for illustration. The spatial or not before using lognormal kriging. dependence of ranks transformed from Cd concentrations was identi- On the other hand, the normal-scored transformation fied and kriging estimation was readily performed in the standardized- is an alternative for dealing with the data, which have rank space. The estimated standardized rank was back-transformed into the concentration space using the middle point model within a a positively skewed distribution with a few extreme val- standardized-rank interval of the empirical distribution function ues. This method can transform any data set having an (EDF). The spatial distribution of Cd concentrations was then ob- asymmetric distribution into the normal scores, which tained. The probability of Cd concentration being higher than a given have a standard normal distribution. Then, the kriging cutoff value also can be estimated by using the estimated distribution estimation can be performed in the normal-scored of standardized ranks. The contour maps of Cd concentrations and space. This approach is based on a multi-Gaussian the probabilities of Cd concentrations being higher than the cutoff model. Only when normal-scored data strictly follow the value can be simultaneously used for delineation of hazardous areas multi-Gaussian distribution (also called multi-normal of contaminated soils. distribution), the kriging estimation in the normal- scored space is valid. In practice, it is difficult to ensure that normal-scored data are multi-Gaussian. Goovaerts G eostatistical interpolation (kriging) provides the (1997) suggested that one should check whether normal- best linear unbiased prediction for spatially depen- scored data are reasonably bi-Gaussian. If they are, then dent properties. Recently, kriging was used for the spa- the multi-Gaussian model may be tenable; if they are tial interpolation of pollutants in contaminated soils not, another approach should be considered. (Arrouays et al., 1996). However, the great variability Both logarithmic and normal-scored transformations of pollutant distributions in soils in conjunction with require that the transformed data should follow a spe- sparse sampling will mask the spatial dependence. In cific distribution function (such as Gaussian or multi- previous studies (Juang and Lee, 1998a; Juang et al., Gaussian distribution). Unfortunately, real data sets 1999), we found that the spatial distributions of heavy may not meet with the severe requirements for using metals in contaminated soils have great variation and logarithmic and normal-scored transformations. Re- high skewness. Moreover, there are a few cases where cently, Journel and Deutsch (1997) proposed an ap- locally extreme values are surrounded by much smaller proach, termed rank-order geostatistics, for integration values. In this situation, there will be huge spatial varia- of information of diverse data types, scales, support, tion among observations over a short distance, and the and accuracy. In this approach, the standardized rank fitted semivariogram model usually has a large nugget transformation is used. There is not any specific require- effect. The large nugget effect means the variable is not ment regarding the distribution of the transformed data. very regular and is discontinuous. If a pure nugget effect The kriging estimation can be performed in the stan- happens, which entails a complete lack of spatial corre- dardized-rank space and then the kriging estimates can lation, the kriging estimate will become a simple arith- be back-transformed into the original space. Therefore, metic average of sampled data and any map generated rank-order geostatistics is an alternative for dealing with by using the kriging process will not be very meaningful. highly skewed data. In addition, the standardized ranks Logarithmic transformation is one approach usually follow a uniform distribution. One can determine the used to detect spatially dependent structures for highly uniform distribution for any unsampled location based skewed data (Cambardella et al., 1994; Litaor, 1995; Van on the kriging estimate and variance. The uniform distri- bution can be used to calculate the probability of the Kai-Wei Juang and Dar-Yuan Lee, Graduate Institute of Agricultural attributes of interest being higher than a given cutoff Chemistry, National Taiwan Univ., Taipei, 106 Taiwan. Timothy R. Ellsworth, Dep. of Natural Resources and Environmental Sciences, Univ. of Illinois, Urbana–Champaign, IL 61801. Received 6 Mar. 2000. Abbreviations: CDF, cumulative distribution function; EDA, explor- *Corresponding author (dylee@ccms.ntu.edu.tw). atory data analysis; EDF, empirical distribution function; ME, mean error; MSRE, mean square relative error. Published in J. Environ. Qual. 30:894–903 (2001). 894 Published May, 2001