Functional Validation in Grid Computing GUOFEI JIANG guofei.jiang@dartmouth.edu Institute of Security Technology Studies and Thayer School of Engineering, Dartmouth College, Hanover, NH 03755, USA GEORGE CYBENKO gvc@dartmouth.edu Institute of Security Technology Studies and Thayer School of Engineering, Dartmouth College, Hanover, NH 03755, USA Abstract. The development of the World Wide Web has changed the way we think about information. Information on the web is distributed, updates are made asynchronously and resources come online and go offline without centralized control. Global networking will similarly change the way we think about and perform computation. Grid computing refers to computing in a distributed networked environment where computing and data resources are located throughout a network. In order to locate these resources dynamically in a grid computation, a broker or matchmaker uses keywords and ontologies to describe and specify grid services. However, we believe that keywords and ontologies can not always be defined or interpreted precisely enough to achieve deep semantic agreement in a truly distributed, heterogeneous computing environment. To this end, we introduce the concept of functional validation. Functional validation goes beyond the symbolic level of brokering and matchmaking, to the level of validating actual functional performance of grid services. In this paper, we present the functional validation concept in grid computing, analyze the possible validation situations and apply basic machine learning theory such as PAC learning and Chernoff bounds to explore the relationship between sample size and confidence in service semantics. Keywords: grid computing, service matching, keywords and ontology, functional validation, PAC learning. 1. Introduction Several efforts are currently underway to build computational grids, such as Globus [1] and the DARPA CoABS Grid [2]. Grids are execution environments that enable an application to integrate geographically distributed computation and data resources. Grid computations may link tens or hundreds of these distributed resources for large-scale computations. The vision is that these grid infrastructures will connect multiple regional and national computational grids, and create a universal source of pervasive and dependable computing power that supports dramatically new classes of applications. A fundamental capability required in such a grid is a semantic broker that dynamically matches user requirements with available resources. On the web, search engines that index web pages and implement retrieval services provide this capability. Whereas humans are typically the consumers of web information, grid agents are the producers and consumers of grid resources with humans occasionally steering or interpreting the computations. When a grid computation needs a computational service such as, for example, a multi- dimensional Fourier transform, it discovers the required service by consulting a distributed object request broker or a matchmaking service. (An object request broker (ORB) not only locates a component or object that performs the required service but also mediates Autonomous Agents and Multi-Agent Systems, 8, 119– 130, 2004 # 2004 Kluwer Academic Publishers. Manufactured in The Netherlands.