Validation of production data by using an AI-based classification methodology; a case in the Gulf of Mexico Olivia Patricia Quiñónez-Gámez * , Rodolfo G. Camacho-Velázquez PEMEX E&P, Mexico article info Article history: Received 4 November 2010 Received in revised form 21 June 2011 Accepted 28 July 2011 Available online 26 August 2011 Keywords: Validation of production data Classification methodology for production data Artificial intelligence applications in production data validation Artificial intelligence -based classification methodology Data mining applications in the E&P industry abstract Production data quality is a topic of general interest in the oil industry. It is often the only information available in sufficient amount in mature fields. With the development of information systems the oil companies have large volumes of data, but not all are reliable. The errors may have various origins, starting with problems in data acquisition systems. The existence of contaminated data may cause operational failures and lead to an inappropriate decision making process. A methodology for identifi- cation of contaminated data was applied in order to determine the quality of a production dataset of an oil field. To achieve this objective, a methodology based on data mining techniques, combining a fuzzy classification algorithm, neural network modeling and an iterative process, was applied to a real case, a database of an offshore field en México and the result was the classification of data: good, slightly contaminated or bad. The decline behavior of a well was evaluated, with good and slightly contaminated data and the results were appropriate. We concluded that this classification methodology based on intelligent algorithms generated a simple solution to the problem of quality determination of production data. We found that this methodology can be applied to any dataset. Of course, there is a degree of subjectivity in the methodology, and changing the restriction criteria for classification, the data quality determination may change. Furthermore, during the application of the proposed methodology, it was shown the effectiveness of data mining tools in the estimation of missing data. Ó 2011 Elsevier B.V. All rights reserved. 1. Background The analysis and interpretation of the behavior of a reservoir using only production and pressure history is a common practice, and in most cases it is the only information available in sufficient amount in wells in an advanced stage of exploitation. It is frequently observed that the information used during production decline analysis is not totally reliable mainly because of the existence of contaminated data that can lead to incorrect production forecasts. In some cases the problem of determining data quality can be solved by means of basic statistical analysis, but sometimes this technique may not be enough for this purpose. Thus, there is a need to explore other ways to solve this problem. Anderson et al.(Anderson et al., 2006) proposed a set of guide- lines for the analysis of production data, including diagnostic analysis which consider the validation of the correlation between rate and pressure and to establish a reservoir model using diag- nostic plots like log(q/Dp) vs log(N p /q). These diagnostic plots may highlight what is wrong with production and pressure data in a qualitative more than analytical manner. The authors also show that common problems like the absence of rate and/or pressure information and the existence of erroneous data may have a definitive influence in the analysis and interpretation of production decline data. Hence it is important to identify and correct these anomalies. Popa et al.(Popa et al., 2003) proposed a methodology to vali- date data which was successful when applied to hydraulic frac- turing database. This methodology is based on the following hypothesis: in a well behaved system, the output should be able to contribute to its own prediction and identification. The purpose of this study was to apply this methodology and the concept of Entropy, which measures the degree of lack of similarity between elements, to a production database of a Mexican offshore field in order to identify contaminated data. This field is a naturally fractured reservoir with 54 wells drilled in carbonate breccias of the Upper Cretaceous, and began produc- tion in the early 80’s. Currently, most of the wells in this field use * Corresponding Author. Tel.: þ52 55 1944 9051. E-mail addresses: olivia.patricia.quinonez@pemex.com, pquinonezg@gmail.com (O. P. Quiñónez-Gámez). Contents lists available at ScienceDirect Journal of Natural Gas Science and Engineering journal homepage: www.elsevier.com/locate/jngse 1875-5100/$ e see front matter Ó 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.jngse.2011.07.015 Journal of Natural Gas Science and Engineering 3 (2011) 729e734