Remote Batch Invocation for SQL Databases William R. Cook Department of Computer Science University of Texas at Austin wcook@cs.utexas.edu Ben Wiedermann Department of Computer Science Harvey Mudd College benw@cs.hmc.edu ABSTRACT Batch services are a new approach to distributed computation in which clients send batches of operations for execution on a server and receive hierarchical results sets in response. In this paper we show how batch services provide a simple and powerful interface to relational databases, with support for arbitrary nested queries and bulk updates. One important property of the system is that a single batch statement always generates a constant number of SQL queries, no matter how many nested loops are used. 1. INTRODUCTION Batches are a new programming model for efficient access to distributed services [15, 16]. The key architectural difference from previous approaches is that batches require clients to send collec- tions of operations to servers rather than individual messages/method invocations. The collection of operations sent by a client are rep- resented as scripts written in a domain-specific language designed specifically to represent batches of related operations. With batches, conventional wisdom is inverted: fine-grained interfaces are en- couraged, proxies are avoided, serialization is not needed. The key innovation that makes batches work is a new client-side invocation model, the batch statement. A batch statement specifies a remote root and a block of state- ments that can intermingle operations on the remote root and ordi- nary local objects. The examples in this paper are written in Jaba, a version of Java extended with a batch statement. The following code fragment prints information about files located on a remote server: 1 batch (File root : new TCPClient<File>("74.1.9.14", 1025)) 2 for (File f : root.listFiles()) 3 if (f.length() > 500) 4 System.out.println(f.getName() + ": " + f.lastModified()); The batch statement defines root as a File handle on a remote server with a given IP address and port number. The body of the 1 This material is based upon work supported by the National Sci- ence Foundation under Grants #0448128 and #0724979. 2 The Jaba compiler allows the for keyword to be used in place of batch, so that Jaba programs are compatible with Java syntax. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. This article was presented at: DBPL ’11. Copyright 2011. batch statement in lines 2–4 iterates over each remote file and prints its name and last-modified date if the file’s length exceeds a given length. The block intermingles remote operations (e.g., getting a remote file’s name) and local operations (e.g., printing the file’s name). Although this code follows normal syntax and typing con- ventions, its execution model is radically different from ordinary sequential execution: Remote and local operations are reordered relative to one another, duplicating loops and conditionals where necessary, so that all remote operations are performed in one round trip to the server. The order of local operations is preserved, but the semantics of remote operations is defined by the server. Step 1 only succeeds if there are no back and forth interdepen- dencies between local and remote operations, otherwise a compiler error is issued. The compiler then translates the batch statement to send the remote operations to the server as a batch script. The batch script for the example given above produces a table with two columns, labeled A and B: 1 for (f : * .listFiles()) / * generated batch script code * / 2 if (f.length() > 500) 3 A: f.getName() 4 B: f.lastModified() Details on batch scripts, updated to support database access, are presented in Section 3. The resulting tree is a hierarchy of records with tagged values, similar to JSON, XML, or hierarchical tables. The remaining client code produced by the compiler, once the re- mote operations are removed, is the following: 1 for (r in resultSet) / * generated code * / 2 System.out.println(r.get("A") + ": " + r.get("B")); Note that it does not contain any remote operations. The batch execution model does not involve the use of proxies or object seri- alization. The batch script creates a dynamic Service Façade on the server, and the result tree is a dynamically created Data Transfer Object [10]. The end result is a natural fine-grained programming model that provides easy access to remote services. Batches generalize the communication model used for access to database servers, where a small imperative scripting language re- places SQL, and results are hierarchical record sets, not flat tables. Thus it is not surprising that batches can be used to encode access to relational databases as a special case. 2. BATCHES FOR DATABASE ACCESS Batches combine the two most important requirements for database access: a natural programming model and optimized execution. Batches are a natural programming model because they allow clients