Design Rule Hierarchies and Parallelism in Software Development Tasks Sunny Wong, Yuanfang Cai, Giuseppe Valetto, Georgi Simeonov, and Kanwarpreet Sethi Department of Computer Science Drexel University Philadelphia, PA, USA {sunny, yfcai, valetto, ges37, kss33}@cs.drexel.edu Abstract—As software projects continue to grow in scale, being able to maximize the work that developers can carry out in parallel as a set of concurrent development tasks, without incurring excessive coordination overhead, becomes increas- ingly important. Prevailing design models, however, are not explicitly conceived to suggest how development tasks on the software modules they describe can be effectively parallelized. In this paper, we present a design rule hierarchy based on the assumption relations among design decisions. Software modules located within the same layer of the hierarchy suggest indepen- dent, hence parallelizable, tasks. Dependencies between layers or within a module suggest the need for coordination during concurrent work. We evaluate our approach by investigating the source code and mailing list of Apache Ant. We observe that technical communication between developers working on different modules within the same hierarchy layer, as predicted, is significantly less than communication between developers working across layers. Keywords-software architecture; collaboration; project man- agement I. I NTRODUCTION In today’s large-scale, distributed software development projects, it is increasingly crucial to maximize the level of concurrency among development tasks, and at the same time avoid incurring huge coordination overheads among development teams tasked with concurrent work. It has been long recognized that software modulariza- tion plays a critical role in streamlining project coordi- nation, as the need for coordination among developers is closely related to the dependencies between the system modules [1], [2]. Numerous researchers have explored the interplay of coordination—in particular in the form of per- sonal communication—and modularization for large-scale software systems [3]–[7]. Still, prevailing models of design, such as UML, are not equipped with formal means to provide software project managers with explicit guidance on how development tasks can be constructed, partitioned, and assigned to maximize the parallelization of developers’ work, based on the dependency relations among the software modules they describe. Parnas’s information hiding principle [1] and Baldwin and Clark’s design rule theory [8] provide key (although non- operational) insights about the relation between software modularization and task assignment. Parnas defined a mod- ule as am independent task assignment, a concept that is not equivalent to the conventional understanding of mod- ules as structural constructs, such as functions or classes. Baldwin and Clark define design rules as stable decisions that decouple otherwise coupled decisions. Example design rules include abstract interfaces, application programming interfaces (APIs), etc. The more subordinate decisions that depend on a design rule, the more influential it is, and the more important it is to keep it stable. Identifying design rules and their impact scopes is not trivial in large-scale systems. In this paper, we present an approach to automatically cluster a software dependency structure into a design rule hierarchy (DRH) that manifests Parnas’s and Baldwin and Clark’s definition of module and design rule. In this hier- archy, the decisions within the top layer of the hierarchy are the most influential design rules, which dominate the rest of the system, and need to be kept stable. The deci- sions within subsequent layers assume design decisions in previous layers. The design decisions within each layer are clustered into modules. Since modules within the same layer are independent from each other, they become candidates for concurrent implementation. We hypothesize that this hierarchy, populated with suffi- cient dependency relations, can shed light on the interplay between software structure, task parallelism, and develop- ers’ coordination needs. Concretely, the DRH predicts that developers working on different modules within the same layer do not have communication requirements [3]; whereas dependencies between modules located in different layers, or within the same module, create communication requirements among developers working in those contexts. The accuracy of the DRH predictions on coordination requirements fundamentally depends on the quality of the underlying model of software dependency. Cataldo et al. [9] show that, for instance, syntactical dependencies extracted from source code are not as effective as semantic rela- tionships in terms of individuating coordination require- ments. We recently developed an approach to precisely define and automatically derive pair-wise dependency rela- tions (PWDR) from a formal model called the augmented constraint network (ACN) [10], [11]. An ACN expresses