Throughput Unfairness in Dragonfly Networks under Realistic Traffic Patterns Pablo Fuentes, Enrique Vallejo, Crist´ obal Camarero, Ram´ on Beivide University of Cantabria, Spain {pablo.fuentes, enrique.vallejo, cristobal.camarero, ramon.beivide}@unican.es Mateo Valero Universitat Politecnica de Catalunya (UPC) and Barcelona Supercomputing Center (BSC), Spain mateo@bsc.es This is an earlier accepted version; a final version of this work can be found in the proceedings of the 2015 IEEE International Conference on Cluster Computing (CLUSTER) under DOI 10.1109/CLUSTER.2015.136. Abstract can be read here. Copyright belongs to IEEE. Abstract—Dragonfly networks have a two-level hierarchi- cal arrangement of the network routers, and allow for a competitive cost-performance solution in large systems. Non- minimal adaptive routing is employed to fully exploit the path diversity and increase the performance under adversarial traffic patterns. Throughput unfairness prevents a balanced use of the resources across the network nodes and degrades severely the performance of any application running on an affected node. Previous works have demonstrated the presence of throughput unfairness in Dragonflies under certain adver- sarial traffic patterns, and proposed different alternatives to effectively combat such effect. In this paper we introduce a new traffic pattern denoted adversarial consecutive (ADVc), which portrays a real use case, and evaluate its impact on network performance and throughput fairness. This traffic pattern is the most adversarial in terms of network fairness. Our evaluations, both with or without transit-over-injection priority, show that global misrouting policies do not properly alleviate this problem. Therefore, explicit fairness mechanisms are required for these networks. I. I NTRODUCTION Dragonfly networks are considered as one of the most promising network topologies for upcoming Exascale sys- tems, and have been employed in the PERCS [1] and Cascade [2] system networks. Unfortunately, these networks easily suffer congestion under certain adversarial traffic pat- terns. To overcome bandwidth limitations and fully exploit path diversity, non-minimal adaptive routing mechanisms are required. These routing mechanisms employ an intermediate random node to divert the traffic before sending minimally towards the destination, improving the utilization of the inter-group (global) links in the event of saturation in a link on the minimal path. Throughput unfairness was identified in [3] when employ- ing an adversarial traffic pattern (ADV) that heavily congests one router in every group. A new global misrouting policy named Mixed-mode (MM) was proposed for the selection of the intermediate group in the non-minimal path for in-transit adaptive routing mechanisms. The MM global misrouting policy provides competitive throughput and latency, while avoiding unfairness in the bottleneck router of the group. Previous evaluations focused on random traffic based on Uniform (UN) and Adversarial (ADV) traffic patterns. UN represents a best-case which is useful to evaluate the topological properties of the network and is considered as a good approximation for the average behavior of several applications, such as data-intensive. ADV represents a corner case that could occur when an application is spread over two (or more) different groups of the Dragonfly network. While these two traffic patterns cover the extreme cases in respect to routing, they do not fully represent the complete spectrum of traffic patterns in a Dragonfly network. In this work, we identify a new traffic pattern designated as Adversarial consecutive (ADVc), and justify its potential occurrence in a real system. Under ADVc, traffic is sent to several destination groups, with their minimal paths meeting in a single router. This pattern is less adversarial than ADV in terms of throughput, but generates the maximum unfairness under both source and in-transit adaptive routing mechanisms. This ADVc traffic pattern threats the benefits of the MM policy, as the minimal and non-minimal paths in the bottleneck router overlap; this will be detailed later in Section III. In this work, we demonstrate that none of the previous routing mechanisms or global misrouting policies prevent throughput unfairness under such traffic pattern. We additionally evaluate the impact of prioritizing transit over injection traffic at the router allocator, noticing that it achieves a slightly higher throughput at the cost of lower fairness. In short summary, our main contributions are: • We highlight the pitfall of optimizing routing mecha- nisms exclusively for corner cases, not for the general case. In particular, we identify a new adversarial traffic pattern, Adversarial consecutive(ADVc), and rationalize its correspondence to a use case in an actual system and how it differs from both UN and ADV. • We quantify the impact of the routing mechanism and the use of transit-over-injection priority on throughput, latency and unfairness, under different traffic patterns including ADVc. • We demonstrate the inability of previous global mis- routing policies to prevent throughput unfairness un-