Journal of Computational Science 3 (2012) 504–510 Contents lists available at SciVerse ScienceDirect Journal of Computational Science j ourna l ho me page: www.elsevier.com/locate/jocs CLAVIRE: e-Science infrastructure for data-driven computing Konstantin V. Knyazkov, Sergey V. Kovalchuk , Timofey N. Tchurov, Sergey V. Maryin, Alexander V. Boukhanovsky University ITMO, St. Petersburg, Russian Federation a r t i c l e i n f o Article history: Received 5 April 2012 Received in revised form 19 August 2012 Accepted 22 August 2012 Available online 29 August 2012 Keywords: Distributed computing Workflow Domain-specific language Data-driven approach e-Science Composite application a b s t r a c t The paper introduces CLAVIRE (CLoud Applications VIRtual Environment) platform. Architecture of the platform is shown with a focus on the abstraction which enables the integration of the distributed com- putational resources, data sources and the software. Coupled domain-specific languages EasyFlow and EasyPackage for unified workflow design are represented. Five classes of users’ interfaces are proposed as a basis for human–computer interaction support in CLAIRE. Interactive workflow model is implemented as a prospective approach for data-driven composite applications. © 2012 Elsevier B.V. All rights reserved. 1. Introduction Nowadays the scientific experiment often requires huge amount of computation during simulation and data processing. Perfor- mance of contemporary supercomputers is increasing rapidly. It allows to solve the computation-intensive scientific problems by processing large arrays of data stored in archives or produced by sensor networks. Thus today we can speak about a new paradigm for scientific research often called e-Science [1]. This paradigm introduces many issues (that have) to be solved by collaboration of ICT-specialists and domain scientists. As the paradigm is tightly related to processing the large arrays of data which are observed within the nature or produced by simulation software, it is required to develop new tools within data-driven approach (DDA) for arrangement of available resources for solving e-Science tasks [2]. Development of computational infrastructure for DDA-computing requires to integrate heterogeneous computing systems, ubiqui- tous sensors, imaging devices, and other data gathering devices, and to develop methodologies and theoretical frameworks for their integration in dynamic simulation systems [3]. Investigation of the abstractions which allows to integrate the distributed resources is an issue of especial importance [4,5] for the development e-Science infrastructure. Corresponding author. E-mail address: sergey.v.kovalchuk@gmail.com (S. V. Kovalchuk). Contemporary computational tasks are characterized by the structural complexity: since they include many subtasks, they require different resources (software, hardware, data storages, decision making procedures, etc.) to be composed within one solution. Today one of the most popular solutions for joining distributed resources is the workflow (WF) approach [6] which permits to organize interaction between different resources pre- sented as the services within computational environment. This approach was successfully applied in the number of e-Science infrastructures by means of WF-management systems (WMS) (e.g. [7,8]) and allows to organize interaction between different resources presented as the services within computational environ- ment. Nevertheless in the frame of DDA having great diversity of resources of all categories (hardware as well as software or data resources) the problem of interoperability of the resources still remains. This paper presents CLAVIRE (CLoud Applications VIRtual Envi- ronment) platform as the e-Science infrastructure platform for DDA-computing. The platform supports the high-level abstract description of computational processes in terms of composite appli- cations, using a set of domain specific software and distributed data sources available within the service-oriented distributed com- putational environment. Composite applications of CLAVIRE are described using an abstract software calling (without definition of particular resources) which can be mapped on available hardware resources. It allows to run the software on different computation platforms (including environments like Grids or cloud infrastruc- tures) using automatic scheduling procedure for resource selection. 1877-7503/$ see front matter © 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.jocs.2012.08.006