A One-class Classification Framework using SVDD : Application to an Imbalanced Geological Dataset Soumi Chaki 1 , Akhilesh Kumar Verma 2 , Aurobinda Routray 1 , William K. Mohanty 2 , Mamata Jenamani 3 1 Department of Electrical Engineering, IIT Kharagpur Kharagpur, India soumibesu2008@gmail.com aroutray@ee.iitkgp.ernet.in 2 Department of Geology and Geophysics, IIT Kharagpur Kharagpur, India akhileshdelhi2007@gmail.com wkmohanty@gg.iitkgp.ernet.in 3 Department of Industrial and Systems Engineering, IIT Kharagpur Kharagpur, India mj@iem.iitkgp.ernet.in Abstract— Evaluation of hydrocarbon reservoir requires classification of petrophysical properties from available dataset. However, characterization of reservoir attributes is difficult due to the nonlinear and heterogeneous nature of the subsurface physical properties. In this context, present study proposes a generalized one class classification framework based on Support Vector Data Description (SVDD) to classify a reservoir characteristic– water saturation into two classes (Class high and Class low) from four logs namely gamma ray, neutron porosity, bulk density, and P-sonic using an imbalanced dataset. A comparison is carried out among proposed framework and different supervised classification algorithms in terms of g-metric means and execution time. Experimental results show that proposed framework has outperformed other classifiers in terms of these performance evaluators. It is envisaged that the classification analysis performed in this study will be useful in further reservoir modeling. Keywords—support vector data description; g-metric mean; one class classification; imbalanced dataset I. INTRODUCTION In the process of reservoir quantification for the production of hydrocarbon, there are several challenges to be solved. These issues include classification of different lithological units, integration of different types of data recorded in different domain, problem of non-uniform sampling, heterogeneous characteristics of reservoir variables, etc. Heterogeneity, i.e. non-uniform, nonlinear characteristics of reservoir properties, introduces difficulty in reservoir modeling. These modeling are carried out using state-of-art nonlinear approaches such as Artificial Neural Networks (ANN), Fuzzy Logic (FL), Genetic Algorithm (GA), etc. Some applications of these methods in the field of petroleum reservoir modeling are discussed in [1]–[4]. However, it has been observed that the accuracy in reservoir modeling can be improved using classification based approaches [5]. Thus, classification of petrophysical parameters is beneficial for reservoir studies. Now, it is a complex task whose performance depends on the available subsurface information. Supervised classifiers are generally selected over unsupervised clustering algorithms due to the complex nature of the problem. Nevertheless, the requirement of a complete and representative training dataset is must for accurate learning of these supervised classifiers. In case of an imbalanced dataset, these constraints of the training dataset do not get satisfied. Moreover, the underrepresented training dataset may have several class distribution skews. Recently, the learning problems from imbalance dataset have received interest from researchers due to existence of such dataset in “real -world applications” [6]–[9]. Kernel based methods have gained acceptance in classification of imbalanced dataset over other supervised classification methods, especially in remote sensing fields [10]–[12]. Support vector data description (SVDD) is a latest kernel based algorithm which has attracted attention from researchers of different fields for its ability in learning without any a priori knowledge on distribution of dataset [13]–[15]. The first important contribution of this paper is to propose a generalized framework based on Support Vector Data Description (SVDD) [13], [14] to characterize water saturation from input well logs. Next, a comparative analysis is presented to demonstrate the effectiveness of the proposed classification method over other classifiers (discriminant [16], [17], naive Bayes [16], [18], support vector machine based classifier [19], [20]). A dataset from four closely spaced wells are selected for this study. Here, combined dataset of three wells are used for training, and remaining one well is used for testing. The rest of the paper is structured as follows: first, the data used in this study is described; next, the theory of SVDD is briefly presented; after that the proposed classification framework is described. Then, a brief description of performance evaluators used in this work is given. In the following section, experimental results are reported. Finally, we conclude this paper with the discussion and future scope. II. DESCRIPTION OF DATASET The well logs used in this work are acquired from four closely spaced boreholes located in an onshore hydrocarbon field of India. Henceforward, these aforementioned wells are to be referred as A, B, C, and D, respectively. The borehole data contains several logs such as gamma ray content (GR),