Knowledge Discovery Process Methodology Developments Twenty-fifth Americas Conference on Information Systems, Cancun, 2019 1 Developments in Knowledge Discovery Processes and Methodologies: Anything New? Completed Research Jeroen Baijens Open University, the Netherlands jeroen.baijens@ou.nl Remko W. Helms Open University, the Netherlands remko.helms@ou.nl Abstract The process of turning data into knowledge is referred to as “knowledge discovery” (KD) and originated in the 1990s. Since that time many different process models and methodologies have been developed. A genealogy presented in 2010, showed how the different models evolved and presented a refined process model, which represents a synthesis of the models presented before. However, the rise of data analytics and big data have changed how organizations do business. The key to these changes is to use data and turn it into knowledge to create value for the organization. Therefore, this study aims to update our understanding of knowledge discovery processes by reviewing the research into KD processes since 2010 in order to understand if there have been considerable changes and developments in this field. The developments in KD process models and methodologies that were found are threefold: tasks, steps and agile practices. Keywords Knowledge discovery process, Process model, Process methodology, Agile practice, Big data. Introduction Nowadays organizations are interested in creating value from data by drawing on analytical techniques to convert raw data into actionable knowledge. This knowledge supports managerial decision-making and allows the organization to take actions that might help creating or sustaining competitive advantage (Provost and Fawcett 2013). The process of using data to create knowledge has already been studied in the 1990s and was referred to as “knowledge discovery” (KD), or “knowledge discovery in databases” by Fayyad, Piatetsky-Shapiro, and Smyth (1996). Today, practitioners and academics often use the term “data analytics” or “data science” interchangeably with the older term knowledge discovery (Chen et al. 2012). The research program into KD which has started in late 1990 has resulted in an abundance of proposed process models and methodologies developed by academics as well as practitioners. The most well-known model is CRISP-DM and is developed by a consortium consisting of industry and academic representatives (Chapman et al. 2000). Mariscal, Marbán and Fernández (2010) reviewed the existing literature on KD process models and proposed a refined KD process model based on a synthesis of the existing process models and methodologies. The resulting model consists of 3 main processes and 17 sub-processes and is the greatest common divisor of the models they analyzed. Rather than a process model it is better called a framework since it only identifies the main and sub-process without further detailing them or providing a complete methodology. However, despite the abundance of models, a survey among data science professionals reveals that 82% of them did not use any existing process model and methodology for knowledge discover (Saltz, Wild, Hotz and Stirling, 2018). Critics of the process models and methodologies argue they are too rigid and do not support the iterative and open nature of most KD projects (Saltz, 2015). This modest uptake might be caused by the fact that most models and methodologies are still very rudimentary and not fit every situation. Mariscal et al. (2010) called for further research into the KD process by further extending the models and methodologies by borrowing from other fields (e.g. software development). Since 2010, several studies have been conducted to further develop and extend the KD