Experience Developing Software Using a Globally Distributed Workforce Alberto Avritzer Siemens Corporate Research 755 College Road East Princeton, NJ 08540 alberto.avritzer@siemens.com Thomas Ostrand, Elaine J. Weyuker AT&T Labs - Research 180 Park Avenue Florham Park, NJ 07932 ostrand, weyuker @research.att.com Abstract Industrial experience assessing the stability of a large mission-critical software project is reported. We observed that the project incurred significant additional delays in re- solving the types of problems usually uncovered when as- sessing mission-critical software stability. We present plau- sible hypotheses about the possible causes of these addi- tional delays. 1 Introduction The availability of a global market for highly qualified lower cost technical labor has created an environment in which mission-critical high-availability software can be de- veloped using global software development processes. In these environments, it is common to find great diversity among development and testing organizations. These orga- nizations may be dispersed across several continents, speak- ing different languages, and upholding significantly differ- ent cultural values. The expectation is that by producing software systems using these lower cost, but highly skilled technical staff members, the total cost of the resulting sys- tems will be significantly lower than it would be if produced entirely using the organizations that have been traditionally used, while at the same time the quality of the resulting soft- ware systems remain acceptably high. Offsetting this expected advantage of using a globally distributed software development team, is the awareness that communication barriers often exist between different cultures under normal operating conditions. One potential danger is that when critical problems are uncovered in high- value, mission-critical systems, the communication barriers may become insurmountable, and could ultimately lead to project failure. A very interesting example of this effect is manifested when testers uncover problems that are difficult to resolve and may have significant impact on customer satisfaction. The resolution of these problems often requires extensive communication between testing and development organi- zations. In addition, because development organizations sometimes need to take expensive corrective action as a re- sult of the issues found by the testing organization, project success requires a very high level of trust between develop- ment and testing organizations. We have observed that a lack of trust between develop- ment and testing organization may leave important prob- lems unresolved for several weeks. As a result, these prob- lems may impact both functional and non-functional sys- tem dimensions such as scalability, availability, reliability and performance. Therefore, we recommend that software development projects pay particular attention to the estab- lishment of trust between development and testing organi- zations. This is always an important issue for those charged with producing high-quality software systems. However, when these organizations are widely dispersed geographi- cally, there are additional forces that tend to exacerbate dif- ficulties. We will describe experiences observed while assessing the stability of a software system produced by an interna- tional company with various development organizations lo- cated in North America, Europe, and Asia. This product is designed to monitor a distributed control system, and is being developed globally at locations in Europe and North America. For this system, software stability was assessed in terms of the maximum observed continuous time of operation without critical failures. The assessment was made by do- ing dynamic performance testing, particularly designed to determine the system’s stability. More details about the sys- tem will be presented in Section 4. This mission-critical system has a formal reliability re- quirement stating that the probability of operating without critical failures for eight hours should be greater than 0.99. The relationship between reliability , failure intensity , and mission time , is given by the formula: IEEE International Conference on Global Software Engineering (ICGSE'06) 0-7695-2663-2/06 $20.00 © 2006