Validation of production data by using an AI-based classication methodology; a case in the Gulf of Mexico Olivia Patricia Quiñónez-Gámez * , Rodolfo G. Camacho-Velázquez PEMEX E&P, Mexico article info Article history: Received 4 November 2010 Received in revised form 21 June 2011 Accepted 28 July 2011 Available online 26 August 2011 Keywords: Validation of production data Classication methodology for production data Articial intelligence applications in production data validation Articial intelligence -based classication methodology Data mining applications in the E&P industry abstract Production data quality is a topic of general interest in the oil industry. It is often the only information available in sufcient amount in mature elds. With the development of information systems the oil companies have large volumes of data, but not all are reliable. The errors may have various origins, starting with problems in data acquisition systems. The existence of contaminated data may cause operational failures and lead to an inappropriate decision making process. A methodology for identi- cation of contaminated data was applied in order to determine the quality of a production dataset of an oil eld. To achieve this objective, a methodology based on data mining techniques, combining a fuzzy classication algorithm, neural network modeling and an iterative process, was applied to a real case, a database of an offshore eld en México and the result was the classication of data: good, slightly contaminated or bad. The decline behavior of a well was evaluated, with good and slightly contaminated data and the results were appropriate. We concluded that this classication methodology based on intelligent algorithms generated a simple solution to the problem of quality determination of production data. We found that this methodology can be applied to any dataset. Of course, there is a degree of subjectivity in the methodology, and changing the restriction criteria for classication, the data quality determination may change. Furthermore, during the application of the proposed methodology, it was shown the effectiveness of data mining tools in the estimation of missing data. Ó 2011 Elsevier B.V. All rights reserved. 1. Background The analysis and interpretation of the behavior of a reservoir using only production and pressure history is a common practice, and in most cases it is the only information available in sufcient amount in wells in an advanced stage of exploitation. It is frequently observed that the information used during production decline analysis is not totally reliable mainly because of the existence of contaminated data that can lead to incorrect production forecasts. In some cases the problem of determining data quality can be solved by means of basic statistical analysis, but sometimes this technique may not be enough for this purpose. Thus, there is a need to explore other ways to solve this problem. Anderson et al.(Anderson et al., 2006) proposed a set of guide- lines for the analysis of production data, including diagnostic analysis which consider the validation of the correlation between rate and pressure and to establish a reservoir model using diag- nostic plots like log(q/Dp) vs log(N p /q). These diagnostic plots may highlight what is wrong with production and pressure data in a qualitative more than analytical manner. The authors also show that common problems like the absence of rate and/or pressure information and the existence of erroneous data may have a denitive inuence in the analysis and interpretation of production decline data. Hence it is important to identify and correct these anomalies. Popa et al.(Popa et al., 2003) proposed a methodology to vali- date data which was successful when applied to hydraulic frac- turing database. This methodology is based on the following hypothesis: in a well behaved system, the output should be able to contribute to its own prediction and identication. The purpose of this study was to apply this methodology and the concept of Entropy, which measures the degree of lack of similarity between elements, to a production database of a Mexican offshore eld in order to identify contaminated data. This eld is a naturally fractured reservoir with 54 wells drilled in carbonate breccias of the Upper Cretaceous, and began produc- tion in the early 80s. Currently, most of the wells in this eld use * Corresponding Author. Tel.: þ52 55 1944 9051. E-mail addresses: olivia.patricia.quinonez@pemex.com, pquinonezg@gmail.com (O. P. Quiñónez-Gámez). Contents lists available at ScienceDirect Journal of Natural Gas Science and Engineering journal homepage: www.elsevier.com/locate/jngse 1875-5100/$ e see front matter Ó 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.jngse.2011.07.015 Journal of Natural Gas Science and Engineering 3 (2011) 729e734