IEEE TRANSACTIONS ON COMPUTERS, VOL. C-33, NO. 12, DECEMBER 1984 A Perspective on Distributed Computer Systems JOHN A. STANKOVIC, MEMBER, IEEE (Invited Paper) Abstract - Distributed computer systems have been the subject of a vast amount of research. Many prototype distributed com- puter systems have been built at university, industrial, commer- cial, and government research laboratories, and production systems of all sizes and types have proliferated. It is impossible to survey all distributed computing system research. Instead, this paper identifies six fundamental distributed computer system re- search issues, points out open research problems in these areas, and describes how these six issues and solutions to problems asso- ciated with them transect the communications subnet, the distrib- uted operating system, and the distributed database areas. It is intended that this perspective on distributed computer system research serve as a form of survey, but more importantly to illus- trate and encourage a better integration and exchange of ideas from various subareas of distributed computer system research. Index Terms -Communications subnet, computer networks, distributed computer systems, distributed databases, distributed operating systems, distributed processing, system software. I. INTRODUCTION A DISTRIBUTED computer system (DCS) is a collection A of processor-memory pairs connected by a commu- nications subnet and logically integrated in varying degrees by a distributed operating system and/or distributed data- base system. The communications subnet may be a widely geographically dispersed collection of communication pro- cessors or a local area network. The widespread use of dis- tributed computer systems is due to the price-performance revolution in microelectronics the development of cost ef- fective and efficient communication subnets (which is itself due to the merging of data communications and computer communications), the development of resource sharing soft- ware, and the increased user demands for communication, economical sharing of resources, and productivity. A DCS potentially provides significant advantages, in- cluding good performance, good reliability, good resource sharing, and extensibility [31], [36], [56]. Potential per- formance enhancement is due to multiple processors and an efficient subnet, as well as avoiding contention and bottlenecks that exist in uniprocessors and multiprocessors. Potential reliability improvements are due to the data and control redundancy possible, the geographical distribution of the system, and the ability for mutual inspection of hosts and communication processors. With the proper subnet and distributed operating system, it is possible to share hardware and software resources in a cost effective manner increasing productivity and lowering costs. Manuscript received May 7, 1984; revised July 14, 1984. The author is with the Department of Electrical and Computer Engineering, University of Massachusetts, Amherst, MA 01003. Possibly the most important potential advantage of a DCS is extensibility. Extensibility is the ability to easily adapt to both short and long term changes without significant dis- ruption of the system. Short term changes include varying workloads and subnet traffic, and host or subnet failures or additions. Long term changes are associated with major modifications to the requirements of the system. In trying to achieve the advantages of DCS,'s, the scope of research has been very broad. In spite of this, there is a relatively small number of fundamental issues dominating the field. Solutions to these fundamental issues have not yet been consolidated in a comprehensive way, thereby thwart- ing the full potential of DCS's. After a brief overview of DCS research (Section II), this paper provides a perspective on six fundamental DCS issues (the object model, access control, distributed control, reliability, heterogeneity, and effi- ciency), identifies problems associated with these issues, shows how these issues interrelate, and describes how they are addressed in different subareas of DCS research (Section III). It is intended that this perspective on DCS research serve as a form of survey, but more importantly to illustrate and encourage a better integration and exchange of ideas from various subareas of DCS research. To keep the scope of this paper reasonable, two fundamental issues, re- search in the theory and specification of distributed systems, and the need for a distributed systems methodology are not specifically discussed. A theory of distributed systems is needed to better understand theoretical limitations and com- plexity. Specification languages must be extended to better treat parallelism, reliability, the distributed nature of the sys- tem being specified, and the correctness of the system. A methodology for the design, construction, and maintenance of large complex distributed systems is necessary. This meth- odology must address the specific problems of DCS's such as distribution and parallelism. Finally, Section IV contains summary remarks. II. DISTRIBUTED COMPUTER SYSTEMS DCS research encompasses many areas, including: the communication subnet, local area networks, distributed operating systems, distributed databases, concurrent and dis- tributed programming languages, specification languages for concurrent systems, theory of parallel algorithms, parallel architectures and interconnection structures, fault tolerant and ultrareliable systems, distributed real-time systems, co- operative problem solving techniques of artificial intel- ligence, distributed debugging, distributed simulation, and distributed applications [23], [47], [89]. There are also the 0018-9340/84/1200-1102$01.00 © 1984 IEEE 1102